Tree Search and Monte Carlo

Overview

When simple chain reasoning is insufficient, agents need more systematic search strategies. Tree of Thoughts (ToT), Graph of Thoughts (GoT), and Monte Carlo Tree Search (MCTS) extend LLM reasoning from a "single chain" to a "tree/graph" structure, supporting backtracking, branching, and global optimization. This article provides an in-depth analysis of the principles and implementations of these methods.

1. From Chains to Trees

1.1 Reasoning Structure Spectrum

graph LR
    IO[Input-Output<br/>Direct Answer] --> COT[Chain of Thought<br/>Linear Chain]
    COT --> SC[Self-Consistency<br/>Multi-Chain Voting]
    SC --> TOT[Tree of Thoughts<br/>Tree Search]
    TOT --> GOT[Graph of Thoughts<br/>Graph Search]
    GOT --> MCTS_LLM[MCTS + LLM<br/>Monte Carlo Tree Search]

Method	Structure	Search Strategy	Backtracking	Evaluation
IO	Point	None	None	None
CoT	Chain	Greedy	None	None
Self-Consistency	Multi-chain	Sampling + Voting	None	Majority vote
ToT	Tree	BFS/DFS	Yes	LLM evaluation
GoT	Graph	Topological sort	Yes + Merge	LLM evaluation
MCTS	Tree	UCB-guided	Yes	Simulation + Backpropagation

2. Tree of Thoughts (ToT)

2.1 Core Idea

ToT, proposed by Yao et al. (2023), models the reasoning process as a search problem in a thought tree:

\[ \text{ToT}: \text{Search}(\mathcal{T}, V, \pi_{\text{search}}) \]

where:

$\mathcal{T}$: Thought tree (each node is a partial solution / intermediate thought)
$V: \text{Node} \rightarrow \mathbb{R}$: Value function (evaluates the prospect of each thought state)
$\pi_{\text{search}}$: Search strategy (BFS or DFS)

2.2 Four Core Components of ToT

graph TD
    subgraph ToT Framework
        DECOMP[1. Thought Decomposition]
        GEN[2. Thought Generator]
        EVAL[3. State Evaluator]
        SEARCH[4. Search Algorithm]
    end

    DECOMP --> GEN
    GEN --> EVAL
    EVAL --> SEARCH
    SEARCH --> |Expand| GEN
    SEARCH --> |Backtrack| GEN

Component 1: Thought Decomposition

Decomposes the problem into intermediate "thought steps." The key design decision is granularity:

Too fine (word level): Search space too large
Too coarse (paragraph level): Lacks backtracking points
Just right (sentence/paragraph level): Balances search efficiency and flexibility

Component 2: Thought Generator

Two strategies:

Sample: Independently sample $k$ thoughts $$t_i \sim P_{\text{LLM}}(\cdot \mid s), \quad i = 1, \ldots, k$$

Propose: Generate multiple candidates at once $$[t_1, t_2, \ldots, t_k] = \text{LLM}(\text{"List k possible next steps"} \mid s)$$

Component 3: State Evaluator

Uses the LLM to evaluate the prospect of each thought state:

Scoring approach:

\[ V(s) = \text{LLM}(\text{"Rate the prospect of this partial solution (1-10)"} \mid s) \]

Voting approach:

\[ V(s) = \frac{1}{K} \sum_{k=1}^{K} \mathbb{1}[\text{LLM}_k(\text{"Is this direction promising?"} \mid s) = \text{"yes"}] \]

Component 4: Search Algorithm

BFS (Breadth-First Search):

Maintain a beam of size b
At each level:
  Generate k candidates for each state in the beam
  Evaluate all candidates
  Keep the top-b scoring ones for the next level

DFS (Depth-First Search):

Start deep exploration from root
If current state evaluation falls below threshold → prune and backtrack
If maximum depth reached → backtrack
If solution found → return

2.3 ToT Example: Game of 24

Task: Use four numbers (1, 5, 6, 10) with basic arithmetic to get 24.

Thought Step 1 (generate 3 candidates):
  (a) 10 - 1 = 9  (remaining: 5, 6, 9)  Evaluation: Promising ✓
  (b) 1 + 5 = 6   (remaining: 6, 6, 10) Evaluation: Promising ✓
  (c) 6 / 1 = 6   (remaining: 5, 6, 10) Evaluation: Promising ✓

Thought Step 2 (expand from (a)):
  (a1) 5 + 9 = 14  (remaining: 6, 14)  Evaluation: Unlikely ✗
  (a2) 6 - 5 = 1   (remaining: 1, 9)   Evaluation: Unlikely ✗
  (a3) 9 - 5 = 4   (remaining: 4, 6)   Evaluation: 4×6=24! ✓✓✓

Solution: 10-1=9, 9-5=4, 4×6=24 ✓

2.4 ToT Performance

Task	CoT Accuracy	ToT Accuracy	Improvement
Game of 24	7.3%	74%	+10x
Creative Writing	Baseline	+Significant	-
Crosswords	16%	60%	+3.75x

3. Graph of Thoughts (GoT)

3.1 From Trees to Graphs

GoT, proposed by Besta et al. (2024), extends reasoning from tree structures to directed acyclic graphs (DAGs):

graph TD
    S[Initial State] --> T1[Thought 1]
    S --> T2[Thought 2]
    S --> T3[Thought 3]
    T1 --> T4[Refinement 1]
    T2 --> T4
    T2 --> T5[Refinement 2]
    T3 --> T5
    T4 --> T6[Merge + Refine]
    T5 --> T6
    T6 --> ANS[Final Answer]

Key New Operations:

Operation	Description	In ToT?
Generate	Generate new thought nodes	Yes
Evaluate	Evaluate node quality	Yes
Aggregate	Merge multiple thoughts into one	No
Refine	Improve thoughts based on feedback	No
Backtrack	Return to previous nodes	Yes

3.2 Advantages of GoT

Information merging: Information from different reasoning branches can be combined
Iterative refinement: Nodes can be improved multiple times
More flexible control flow: Not limited to tree structures

4. Monte Carlo Tree Search (MCTS) for LLM

4.1 Classic MCTS

MCTS achieved great success in AlphaGo, with the UCB1 formula at its core:

\[ UCB1(j) = \bar{X}_j + C \sqrt{\frac{\ln n}{n_j}} \]

where:

$\bar{X}_j$: Average reward of node $j$ (exploitation term)
$C \sqrt{\frac{\ln n}{n_j}}$: Exploration bonus (exploration term)
$n$: Total visit count of the parent node
$n_j$: Visit count of node $j$
$C$: Exploration constant, controlling exploration-exploitation balance

4.2 Four Phases of MCTS

graph LR
    SEL[1. Selection<br/>UCB1-Guided] --> EXP[2. Expansion<br/>Add New Nodes]
    EXP --> SIM[3. Simulation<br/>Random/Heuristic]
    SIM --> BP[4. Backpropagation<br/>Update Statistics]
    BP --> SEL

Selection: Starting from root, select child nodes according to UCB1 until reaching a leaf
Expansion: Add one or more child nodes to the leaf
Simulation: From the new node, simulate to terminal state (random rollout or fast policy)
Backpropagation: Propagate simulation results back along the path, updating statistics at each node

4.3 MCTS + LLM

Applying MCTS to LLM reasoning:

MCTS Component	Board Games	LLM Reasoning
State	Board position	Partial reasoning chain
Action	Move position	Next reasoning step
Expansion	Generate legal moves	LLM generates candidate thoughts
Simulation	Random games	LLM quickly completes reasoning
Evaluation	Win/loss result	LLM evaluates answer correctness
Backpropagation	Update win rate	Update node scores

MCTS-LLM Algorithm:

function MCTS_LLM(question, n_iterations):
    root = Node(state=question)

    for i in 1..n_iterations:
        # 1. Selection: UCB1 guided
        node = select(root)

        # 2. Expansion: LLM generates candidate thought steps
        children = llm.generate_thoughts(node.state, k=3)
        node.add_children(children)

        # 3. Simulation: LLM quickly completes to final answer
        for child in children:
            answer = llm.fast_complete(child.state)
            reward = evaluate(answer, ground_truth)

            # 4. Backpropagation
            backpropagate(child, reward)

    # Return best path
    return best_path(root)

4.4 Insights from AlphaGo

AlphaGo uses a policy network to guide MCTS search:

\[ a^* = \arg\max_a \left[ Q(s, a) + c_{\text{puct}} \cdot P(s, a) \cdot \frac{\sqrt{N(s)}}{1 + N(s, a)} \right] \]

where $P(s, a)$ is the policy network's prior probability and $Q(s, a)$ is the action value.

Analogy to LLM:

Policy network → LLM's prior probability (which reasoning steps are more likely correct)
Value network → LLM's evaluation (how far the current reasoning state is from the correct answer)
MCTS search → Systematically exploring the reasoning space

5. BFS vs. DFS Strategy Comparison

5.1 Applicability Analysis

Strategy	Pros	Cons	Applicable Scenarios
BFS	Won't miss shallow solutions, efficient beam search	High memory, limited depth	Solutions at shallow levels, need to compare multiple plans
DFS	Memory efficient, can explore deep	May get stuck in dead ends	Solutions at deep levels, large search space
MCTS	Exploration-exploitation balance, asymptotically optimal	Requires many iterations	Uncertain evaluation functions, need global optimization
Best-First	Always expands most promising	Heuristic may mislead	Reliable heuristic function available

5.2 Search Visualization

graph TD
    subgraph BFS - Breadth First
        R1[Root] --> A1[A]
        R1 --> B1[B]
        R1 --> C1[C]
        A1 --> D1[D]
        A1 --> E1[E]
        B1 --> F1[F]
    end

BFS expands level by level: Root → A,B,C → D,E,F → ...

DFS explores depth-first: Root → A → D → (backtrack) → E → (backtrack) → B → F → ...

6. Practical Considerations

6.1 Computational Cost Analysis

Let the cost per LLM call be $c$, with search parameters:

Method	LLM Calls	Total Cost
CoT	1	$c$
Self-Consistency (K=5)	5	$5c$
ToT-BFS (d levels, b beam, k candidates)	$d \times b \times k + d \times b \times k$ (evaluation)	$O(dbk) \cdot c$
MCTS (N iterations)	$N \times (\text{expand} + \text{simulate} + \text{evaluate})$	$O(N) \cdot 3c$

6.2 When to Use Tree Search?

Worth using:

Problems with clear evaluation criteria (math, code)
Tasks requiring creative exploration (writing, design)
Large but structured solution spaces (planning, strategy)

Not worth using:

Simple factual queries (direct answer suffices)
Unreliable evaluation functions (subjective questions)
High latency requirements (interactive applications)

7. Frontier Developments

RAP (Reasoning via Planning): Hao et al. (2023) applied MCTS to LLM reasoning
Everything of Thoughts (XoT): A framework unifying CoT, ToT, and GoT
Language Agent Tree Search (LATS): Combining ReAct and MCTS for agent search
Process Reward Models (PRM): Providing fine-grained reward signals for each reasoning step

References

Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023.
Besta, M. et al. (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. AAAI 2024.
Hao, S. et al. (2023). Reasoning with Language Model is Planning with World Model. EMNLP 2023.
Silver, D. et al. (2016). Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529, 484-489.
Browne, C. et al. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE TCIAIG, 4(1), 1-43.
Zhou, A. et al. (2023). Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models. arXiv:2310.04406.

Method	LLM Calls	Total Cost
CoT	1	\(c\)
Self-Consistency (K=5)	5	\(5c\)
ToT-BFS (d levels, b beam, k candidates)	\(d \times b \times k + d \times b \times k\) (evaluation)	\(O(dbk) \cdot c\)
MCTS (N iterations)	\(N \times (\text{expand} + \text{simulate} + \text{evaluate})\)	\(O(N) \cdot 3c\)