Skip to content

Tree Search and Monte Carlo

Overview

When simple chain reasoning is insufficient, agents need more systematic search strategies. Tree of Thoughts (ToT), Graph of Thoughts (GoT), and Monte Carlo Tree Search (MCTS) extend LLM reasoning from a "single chain" to a "tree/graph" structure, supporting backtracking, branching, and global optimization. This article provides an in-depth analysis of the principles and implementations of these methods.


1. From Chains to Trees

1.1 Reasoning Structure Spectrum

graph LR
    IO[Input-Output<br/>Direct Answer] --> COT[Chain of Thought<br/>Linear Chain]
    COT --> SC[Self-Consistency<br/>Multi-Chain Voting]
    SC --> TOT[Tree of Thoughts<br/>Tree Search]
    TOT --> GOT[Graph of Thoughts<br/>Graph Search]
    GOT --> MCTS_LLM[MCTS + LLM<br/>Monte Carlo Tree Search]
Method Structure Search Strategy Backtracking Evaluation
IO Point None None None
CoT Chain Greedy None None
Self-Consistency Multi-chain Sampling + Voting None Majority vote
ToT Tree BFS/DFS Yes LLM evaluation
GoT Graph Topological sort Yes + Merge LLM evaluation
MCTS Tree UCB-guided Yes Simulation + Backpropagation

2. Tree of Thoughts (ToT)

2.1 Core Idea

ToT, proposed by Yao et al. (2023), models the reasoning process as a search problem in a thought tree:

\[ \text{ToT}: \text{Search}(\mathcal{T}, V, \pi_{\text{search}}) \]

where:

  • \(\mathcal{T}\): Thought tree (each node is a partial solution / intermediate thought)
  • \(V: \text{Node} \rightarrow \mathbb{R}\): Value function (evaluates the prospect of each thought state)
  • \(\pi_{\text{search}}\): Search strategy (BFS or DFS)

2.2 Four Core Components of ToT

graph TD
    subgraph ToT Framework
        DECOMP[1. Thought Decomposition]
        GEN[2. Thought Generator]
        EVAL[3. State Evaluator]
        SEARCH[4. Search Algorithm]
    end

    DECOMP --> GEN
    GEN --> EVAL
    EVAL --> SEARCH
    SEARCH --> |Expand| GEN
    SEARCH --> |Backtrack| GEN

Component 1: Thought Decomposition

Decomposes the problem into intermediate "thought steps." The key design decision is granularity:

  • Too fine (word level): Search space too large
  • Too coarse (paragraph level): Lacks backtracking points
  • Just right (sentence/paragraph level): Balances search efficiency and flexibility

Component 2: Thought Generator

Two strategies:

  • Sample: Independently sample \(k\) thoughts $\(t_i \sim P_{\text{LLM}}(\cdot \mid s), \quad i = 1, \ldots, k\)$
  • Propose: Generate multiple candidates at once $\([t_1, t_2, \ldots, t_k] = \text{LLM}(\text{"List k possible next steps"} \mid s)\)$

Component 3: State Evaluator

Uses the LLM to evaluate the prospect of each thought state:

Scoring approach:

\[ V(s) = \text{LLM}(\text{"Rate the prospect of this partial solution (1-10)"} \mid s) \]

Voting approach:

\[ V(s) = \frac{1}{K} \sum_{k=1}^{K} \mathbb{1}[\text{LLM}_k(\text{"Is this direction promising?"} \mid s) = \text{"yes"}] \]

Component 4: Search Algorithm

BFS (Breadth-First Search):

Maintain a beam of size b
At each level:
  Generate k candidates for each state in the beam
  Evaluate all candidates
  Keep the top-b scoring ones for the next level

DFS (Depth-First Search):

Start deep exploration from root
If current state evaluation falls below threshold → prune and backtrack
If maximum depth reached → backtrack
If solution found → return

2.3 ToT Example: Game of 24

Task: Use four numbers (1, 5, 6, 10) with basic arithmetic to get 24.

Thought Step 1 (generate 3 candidates):
  (a) 10 - 1 = 9  (remaining: 5, 6, 9)  Evaluation: Promising ✓
  (b) 1 + 5 = 6   (remaining: 6, 6, 10) Evaluation: Promising ✓
  (c) 6 / 1 = 6   (remaining: 5, 6, 10) Evaluation: Promising ✓

Thought Step 2 (expand from (a)):
  (a1) 5 + 9 = 14  (remaining: 6, 14)  Evaluation: Unlikely ✗
  (a2) 6 - 5 = 1   (remaining: 1, 9)   Evaluation: Unlikely ✗
  (a3) 9 - 5 = 4   (remaining: 4, 6)   Evaluation: 4×6=24! ✓✓✓

Solution: 10-1=9, 9-5=4, 4×6=24 ✓

2.4 ToT Performance

Task CoT Accuracy ToT Accuracy Improvement
Game of 24 7.3% 74% +10x
Creative Writing Baseline +Significant -
Crosswords 16% 60% +3.75x

3. Graph of Thoughts (GoT)

3.1 From Trees to Graphs

GoT, proposed by Besta et al. (2024), extends reasoning from tree structures to directed acyclic graphs (DAGs):

graph TD
    S[Initial State] --> T1[Thought 1]
    S --> T2[Thought 2]
    S --> T3[Thought 3]
    T1 --> T4[Refinement 1]
    T2 --> T4
    T2 --> T5[Refinement 2]
    T3 --> T5
    T4 --> T6[Merge + Refine]
    T5 --> T6
    T6 --> ANS[Final Answer]

Key New Operations:

Operation Description In ToT?
Generate Generate new thought nodes Yes
Evaluate Evaluate node quality Yes
Aggregate Merge multiple thoughts into one No
Refine Improve thoughts based on feedback No
Backtrack Return to previous nodes Yes

3.2 Advantages of GoT

  1. Information merging: Information from different reasoning branches can be combined
  2. Iterative refinement: Nodes can be improved multiple times
  3. More flexible control flow: Not limited to tree structures

4. Monte Carlo Tree Search (MCTS) for LLM

4.1 Classic MCTS

MCTS achieved great success in AlphaGo, with the UCB1 formula at its core:

\[ UCB1(j) = \bar{X}_j + C \sqrt{\frac{\ln n}{n_j}} \]

where:

  • \(\bar{X}_j\): Average reward of node \(j\) (exploitation term)
  • \(C \sqrt{\frac{\ln n}{n_j}}\): Exploration bonus (exploration term)
  • \(n\): Total visit count of the parent node
  • \(n_j\): Visit count of node \(j\)
  • \(C\): Exploration constant, controlling exploration-exploitation balance

4.2 Four Phases of MCTS

graph LR
    SEL[1. Selection<br/>UCB1-Guided] --> EXP[2. Expansion<br/>Add New Nodes]
    EXP --> SIM[3. Simulation<br/>Random/Heuristic]
    SIM --> BP[4. Backpropagation<br/>Update Statistics]
    BP --> SEL
  1. Selection: Starting from root, select child nodes according to UCB1 until reaching a leaf
  2. Expansion: Add one or more child nodes to the leaf
  3. Simulation: From the new node, simulate to terminal state (random rollout or fast policy)
  4. Backpropagation: Propagate simulation results back along the path, updating statistics at each node

4.3 MCTS + LLM

Applying MCTS to LLM reasoning:

MCTS Component Board Games LLM Reasoning
State Board position Partial reasoning chain
Action Move position Next reasoning step
Expansion Generate legal moves LLM generates candidate thoughts
Simulation Random games LLM quickly completes reasoning
Evaluation Win/loss result LLM evaluates answer correctness
Backpropagation Update win rate Update node scores

MCTS-LLM Algorithm:

function MCTS_LLM(question, n_iterations):
    root = Node(state=question)

    for i in 1..n_iterations:
        # 1. Selection: UCB1 guided
        node = select(root)

        # 2. Expansion: LLM generates candidate thought steps
        children = llm.generate_thoughts(node.state, k=3)
        node.add_children(children)

        # 3. Simulation: LLM quickly completes to final answer
        for child in children:
            answer = llm.fast_complete(child.state)
            reward = evaluate(answer, ground_truth)

            # 4. Backpropagation
            backpropagate(child, reward)

    # Return best path
    return best_path(root)

4.4 Insights from AlphaGo

AlphaGo uses a policy network to guide MCTS search:

\[ a^* = \arg\max_a \left[ Q(s, a) + c_{\text{puct}} \cdot P(s, a) \cdot \frac{\sqrt{N(s)}}{1 + N(s, a)} \right] \]

where \(P(s, a)\) is the policy network's prior probability and \(Q(s, a)\) is the action value.

Analogy to LLM:

  • Policy network → LLM's prior probability (which reasoning steps are more likely correct)
  • Value network → LLM's evaluation (how far the current reasoning state is from the correct answer)
  • MCTS search → Systematically exploring the reasoning space

5. BFS vs. DFS Strategy Comparison

5.1 Applicability Analysis

Strategy Pros Cons Applicable Scenarios
BFS Won't miss shallow solutions, efficient beam search High memory, limited depth Solutions at shallow levels, need to compare multiple plans
DFS Memory efficient, can explore deep May get stuck in dead ends Solutions at deep levels, large search space
MCTS Exploration-exploitation balance, asymptotically optimal Requires many iterations Uncertain evaluation functions, need global optimization
Best-First Always expands most promising Heuristic may mislead Reliable heuristic function available

5.2 Search Visualization

graph TD
    subgraph BFS - Breadth First
        R1[Root] --> A1[A]
        R1 --> B1[B]
        R1 --> C1[C]
        A1 --> D1[D]
        A1 --> E1[E]
        B1 --> F1[F]
    end

BFS expands level by level: Root → A,B,C → D,E,F → ...

DFS explores depth-first: Root → A → D → (backtrack) → E → (backtrack) → B → F → ...


6. Practical Considerations

6.1 Computational Cost Analysis

Let the cost per LLM call be \(c\), with search parameters:

Method LLM Calls Total Cost
CoT 1 \(c\)
Self-Consistency (K=5) 5 \(5c\)
ToT-BFS (d levels, b beam, k candidates) \(d \times b \times k + d \times b \times k\) (evaluation) \(O(dbk) \cdot c\)
MCTS (N iterations) \(N \times (\text{expand} + \text{simulate} + \text{evaluate})\) \(O(N) \cdot 3c\)

Worth using:

  • Problems with clear evaluation criteria (math, code)
  • Tasks requiring creative exploration (writing, design)
  • Large but structured solution spaces (planning, strategy)

Not worth using:

  • Simple factual queries (direct answer suffices)
  • Unreliable evaluation functions (subjective questions)
  • High latency requirements (interactive applications)

7. Frontier Developments

  1. RAP (Reasoning via Planning): Hao et al. (2023) applied MCTS to LLM reasoning
  2. Everything of Thoughts (XoT): A framework unifying CoT, ToT, and GoT
  3. Language Agent Tree Search (LATS): Combining ReAct and MCTS for agent search
  4. Process Reward Models (PRM): Providing fine-grained reward signals for each reasoning step

References

  1. Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023.
  2. Besta, M. et al. (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. AAAI 2024.
  3. Hao, S. et al. (2023). Reasoning with Language Model is Planning with World Model. EMNLP 2023.
  4. Silver, D. et al. (2016). Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529, 484-489.
  5. Browne, C. et al. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE TCIAIG, 4(1), 1-43.
  6. Zhou, A. et al. (2023). Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models. arXiv:2310.04406.

评论 #