Tree Search and Monte Carlo
Overview
When simple chain reasoning is insufficient, agents need more systematic search strategies. Tree of Thoughts (ToT), Graph of Thoughts (GoT), and Monte Carlo Tree Search (MCTS) extend LLM reasoning from a "single chain" to a "tree/graph" structure, supporting backtracking, branching, and global optimization. This article provides an in-depth analysis of the principles and implementations of these methods.
1. From Chains to Trees
1.1 Reasoning Structure Spectrum
graph LR
IO[Input-Output<br/>Direct Answer] --> COT[Chain of Thought<br/>Linear Chain]
COT --> SC[Self-Consistency<br/>Multi-Chain Voting]
SC --> TOT[Tree of Thoughts<br/>Tree Search]
TOT --> GOT[Graph of Thoughts<br/>Graph Search]
GOT --> MCTS_LLM[MCTS + LLM<br/>Monte Carlo Tree Search]
| Method | Structure | Search Strategy | Backtracking | Evaluation |
|---|---|---|---|---|
| IO | Point | None | None | None |
| CoT | Chain | Greedy | None | None |
| Self-Consistency | Multi-chain | Sampling + Voting | None | Majority vote |
| ToT | Tree | BFS/DFS | Yes | LLM evaluation |
| GoT | Graph | Topological sort | Yes + Merge | LLM evaluation |
| MCTS | Tree | UCB-guided | Yes | Simulation + Backpropagation |
2. Tree of Thoughts (ToT)
2.1 Core Idea
ToT, proposed by Yao et al. (2023), models the reasoning process as a search problem in a thought tree:
where:
- \(\mathcal{T}\): Thought tree (each node is a partial solution / intermediate thought)
- \(V: \text{Node} \rightarrow \mathbb{R}\): Value function (evaluates the prospect of each thought state)
- \(\pi_{\text{search}}\): Search strategy (BFS or DFS)
2.2 Four Core Components of ToT
graph TD
subgraph ToT Framework
DECOMP[1. Thought Decomposition]
GEN[2. Thought Generator]
EVAL[3. State Evaluator]
SEARCH[4. Search Algorithm]
end
DECOMP --> GEN
GEN --> EVAL
EVAL --> SEARCH
SEARCH --> |Expand| GEN
SEARCH --> |Backtrack| GEN
Component 1: Thought Decomposition
Decomposes the problem into intermediate "thought steps." The key design decision is granularity:
- Too fine (word level): Search space too large
- Too coarse (paragraph level): Lacks backtracking points
- Just right (sentence/paragraph level): Balances search efficiency and flexibility
Component 2: Thought Generator
Two strategies:
- Sample: Independently sample \(k\) thoughts $\(t_i \sim P_{\text{LLM}}(\cdot \mid s), \quad i = 1, \ldots, k\)$
- Propose: Generate multiple candidates at once $\([t_1, t_2, \ldots, t_k] = \text{LLM}(\text{"List k possible next steps"} \mid s)\)$
Component 3: State Evaluator
Uses the LLM to evaluate the prospect of each thought state:
Scoring approach:
Voting approach:
Component 4: Search Algorithm
BFS (Breadth-First Search):
Maintain a beam of size b
At each level:
Generate k candidates for each state in the beam
Evaluate all candidates
Keep the top-b scoring ones for the next level
DFS (Depth-First Search):
Start deep exploration from root
If current state evaluation falls below threshold → prune and backtrack
If maximum depth reached → backtrack
If solution found → return
2.3 ToT Example: Game of 24
Task: Use four numbers (1, 5, 6, 10) with basic arithmetic to get 24.
Thought Step 1 (generate 3 candidates):
(a) 10 - 1 = 9 (remaining: 5, 6, 9) Evaluation: Promising ✓
(b) 1 + 5 = 6 (remaining: 6, 6, 10) Evaluation: Promising ✓
(c) 6 / 1 = 6 (remaining: 5, 6, 10) Evaluation: Promising ✓
Thought Step 2 (expand from (a)):
(a1) 5 + 9 = 14 (remaining: 6, 14) Evaluation: Unlikely ✗
(a2) 6 - 5 = 1 (remaining: 1, 9) Evaluation: Unlikely ✗
(a3) 9 - 5 = 4 (remaining: 4, 6) Evaluation: 4×6=24! ✓✓✓
Solution: 10-1=9, 9-5=4, 4×6=24 ✓
2.4 ToT Performance
| Task | CoT Accuracy | ToT Accuracy | Improvement |
|---|---|---|---|
| Game of 24 | 7.3% | 74% | +10x |
| Creative Writing | Baseline | +Significant | - |
| Crosswords | 16% | 60% | +3.75x |
3. Graph of Thoughts (GoT)
3.1 From Trees to Graphs
GoT, proposed by Besta et al. (2024), extends reasoning from tree structures to directed acyclic graphs (DAGs):
graph TD
S[Initial State] --> T1[Thought 1]
S --> T2[Thought 2]
S --> T3[Thought 3]
T1 --> T4[Refinement 1]
T2 --> T4
T2 --> T5[Refinement 2]
T3 --> T5
T4 --> T6[Merge + Refine]
T5 --> T6
T6 --> ANS[Final Answer]
Key New Operations:
| Operation | Description | In ToT? |
|---|---|---|
| Generate | Generate new thought nodes | Yes |
| Evaluate | Evaluate node quality | Yes |
| Aggregate | Merge multiple thoughts into one | No |
| Refine | Improve thoughts based on feedback | No |
| Backtrack | Return to previous nodes | Yes |
3.2 Advantages of GoT
- Information merging: Information from different reasoning branches can be combined
- Iterative refinement: Nodes can be improved multiple times
- More flexible control flow: Not limited to tree structures
4. Monte Carlo Tree Search (MCTS) for LLM
4.1 Classic MCTS
MCTS achieved great success in AlphaGo, with the UCB1 formula at its core:
where:
- \(\bar{X}_j\): Average reward of node \(j\) (exploitation term)
- \(C \sqrt{\frac{\ln n}{n_j}}\): Exploration bonus (exploration term)
- \(n\): Total visit count of the parent node
- \(n_j\): Visit count of node \(j\)
- \(C\): Exploration constant, controlling exploration-exploitation balance
4.2 Four Phases of MCTS
graph LR
SEL[1. Selection<br/>UCB1-Guided] --> EXP[2. Expansion<br/>Add New Nodes]
EXP --> SIM[3. Simulation<br/>Random/Heuristic]
SIM --> BP[4. Backpropagation<br/>Update Statistics]
BP --> SEL
- Selection: Starting from root, select child nodes according to UCB1 until reaching a leaf
- Expansion: Add one or more child nodes to the leaf
- Simulation: From the new node, simulate to terminal state (random rollout or fast policy)
- Backpropagation: Propagate simulation results back along the path, updating statistics at each node
4.3 MCTS + LLM
Applying MCTS to LLM reasoning:
| MCTS Component | Board Games | LLM Reasoning |
|---|---|---|
| State | Board position | Partial reasoning chain |
| Action | Move position | Next reasoning step |
| Expansion | Generate legal moves | LLM generates candidate thoughts |
| Simulation | Random games | LLM quickly completes reasoning |
| Evaluation | Win/loss result | LLM evaluates answer correctness |
| Backpropagation | Update win rate | Update node scores |
MCTS-LLM Algorithm:
function MCTS_LLM(question, n_iterations):
root = Node(state=question)
for i in 1..n_iterations:
# 1. Selection: UCB1 guided
node = select(root)
# 2. Expansion: LLM generates candidate thought steps
children = llm.generate_thoughts(node.state, k=3)
node.add_children(children)
# 3. Simulation: LLM quickly completes to final answer
for child in children:
answer = llm.fast_complete(child.state)
reward = evaluate(answer, ground_truth)
# 4. Backpropagation
backpropagate(child, reward)
# Return best path
return best_path(root)
4.4 Insights from AlphaGo
AlphaGo uses a policy network to guide MCTS search:
where \(P(s, a)\) is the policy network's prior probability and \(Q(s, a)\) is the action value.
Analogy to LLM:
- Policy network → LLM's prior probability (which reasoning steps are more likely correct)
- Value network → LLM's evaluation (how far the current reasoning state is from the correct answer)
- MCTS search → Systematically exploring the reasoning space
5. BFS vs. DFS Strategy Comparison
5.1 Applicability Analysis
| Strategy | Pros | Cons | Applicable Scenarios |
|---|---|---|---|
| BFS | Won't miss shallow solutions, efficient beam search | High memory, limited depth | Solutions at shallow levels, need to compare multiple plans |
| DFS | Memory efficient, can explore deep | May get stuck in dead ends | Solutions at deep levels, large search space |
| MCTS | Exploration-exploitation balance, asymptotically optimal | Requires many iterations | Uncertain evaluation functions, need global optimization |
| Best-First | Always expands most promising | Heuristic may mislead | Reliable heuristic function available |
5.2 Search Visualization
graph TD
subgraph BFS - Breadth First
R1[Root] --> A1[A]
R1 --> B1[B]
R1 --> C1[C]
A1 --> D1[D]
A1 --> E1[E]
B1 --> F1[F]
end
BFS expands level by level: Root → A,B,C → D,E,F → ...
DFS explores depth-first: Root → A → D → (backtrack) → E → (backtrack) → B → F → ...
6. Practical Considerations
6.1 Computational Cost Analysis
Let the cost per LLM call be \(c\), with search parameters:
| Method | LLM Calls | Total Cost |
|---|---|---|
| CoT | 1 | \(c\) |
| Self-Consistency (K=5) | 5 | \(5c\) |
| ToT-BFS (d levels, b beam, k candidates) | \(d \times b \times k + d \times b \times k\) (evaluation) | \(O(dbk) \cdot c\) |
| MCTS (N iterations) | \(N \times (\text{expand} + \text{simulate} + \text{evaluate})\) | \(O(N) \cdot 3c\) |
6.2 When to Use Tree Search?
Worth using:
- Problems with clear evaluation criteria (math, code)
- Tasks requiring creative exploration (writing, design)
- Large but structured solution spaces (planning, strategy)
Not worth using:
- Simple factual queries (direct answer suffices)
- Unreliable evaluation functions (subjective questions)
- High latency requirements (interactive applications)
7. Frontier Developments
- RAP (Reasoning via Planning): Hao et al. (2023) applied MCTS to LLM reasoning
- Everything of Thoughts (XoT): A framework unifying CoT, ToT, and GoT
- Language Agent Tree Search (LATS): Combining ReAct and MCTS for agent search
- Process Reward Models (PRM): Providing fine-grained reward signals for each reasoning step
References
- Yao, S. et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. NeurIPS 2023.
- Besta, M. et al. (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models. AAAI 2024.
- Hao, S. et al. (2023). Reasoning with Language Model is Planning with World Model. EMNLP 2023.
- Silver, D. et al. (2016). Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529, 484-489.
- Browne, C. et al. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE TCIAIG, 4(1), 1-43.
- Zhou, A. et al. (2023). Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models. arXiv:2310.04406.