Planning and Reasoning Survey
Overview
Planning and reasoning are the core cognitive capabilities of agents. From the STRIPS planner of the 1960s to the LLM reasoning models of the 2020s, the question of "how agents think and formulate action plans" has undergone fundamental paradigm shifts. This article traces this evolution and establishes a framework for understanding the subsequent specialized topics.
1. Classical Planning
1.1 STRIPS Representation
STRIPS (Stanford Research Institute Problem Solver, 1971) defined the standard framework for classical planning:
Planning Problem \(\mathcal{P} = \langle S, A, s_0, G \rangle\):
- \(S\): Set of states (power set of propositions)
- \(A\): Set of actions
- \(s_0 \in S\): Initial state
- \(G \subseteq s\): Goal conditions
Action Definition \(a = \langle \text{Pre}(a), \text{Add}(a), \text{Del}(a) \rangle\):
- \(\text{Pre}(a)\): Preconditions (conditions that must hold to execute \(a\))
- \(\text{Add}(a)\): Add effects (facts added after execution)
- \(\text{Del}(a)\): Delete effects (facts removed after execution)
State Transition:
1.2 PDDL
PDDL (Planning Domain Definition Language) is a standardized extension of STRIPS supporting types, quantifiers, conditional effects, and more:
(define (domain logistics)
(:predicates (at ?obj ?loc) (in ?obj ?vehicle))
(:action load
:parameters (?obj ?vehicle ?loc)
:precondition (and (at ?obj ?loc) (at ?vehicle ?loc))
:effect (and (in ?obj ?vehicle) (not (at ?obj ?loc)))))
1.3 Classical Planning Algorithms
| Algorithm | Type | Complexity | Characteristics |
|---|---|---|---|
| Forward Search | State space | Exponential | Searches from initial state to goal |
| Backward Search | State space | Exponential | Regresses from goal to initial state |
| GraphPlan | Graph search | Polynomial graph construction | Builds planning graph then extracts solution |
| SAT Planning | Compilation | SAT complexity | Converts to Boolean satisfiability problem |
| FF | Heuristic search | Heuristic-guided | Relaxed problem as heuristic function |
| HTN | Hierarchical | Domain-dependent | Decomposes tasks into subtasks |
2. Paradigm Shift from Classical to Modern
graph LR
A[STRIPS/PDDL<br/>Symbolic Planning<br/>1970s-2000s] --> B[MDP/POMDP<br/>Probabilistic Planning<br/>1990s-2010s]
B --> C[Deep RL<br/>Learned Planning<br/>2013-2020]
C --> D[LLM Reasoning<br/>Language Planning<br/>2022-]
D --> E[Reasoning Models<br/>o1/R1<br/>2024-]
2.1 Why LLMs Changed Everything
| Dimension | Classical Planning | LLM Planning |
|---|---|---|
| Knowledge source | Manually coded domain models | World knowledge from pre-training corpora |
| State representation | Formal logical propositions | Natural language descriptions |
| Action definition | PDDL operators | Natural language instructions + Tool APIs |
| Goal representation | Logical formulae | Natural language task descriptions |
| Search algorithm | Complete search | Autoregressive generation + Sampling |
| Generalization | In-domain complete | Cross-domain but unreliable |
| Failure mode | Search timeout | Hallucination, omission, loops |
Key LLM Advantage: No predefined domain model needed; can directly generate action plans from natural language task descriptions.
Key LLM Weakness: Lacks correctness guarantees; cannot prove plan optimality or completeness like classical planners.
3. Core Methods for LLM Reasoning
3.1 Method Taxonomy
graph TD
ROOT[LLM Reasoning Methods] --> IO[Direct Input-Output<br/>Standard Prompting]
ROOT --> COT[Chain-of-Thought]
ROOT --> REACT[ReAct<br/>Reasoning + Acting]
ROOT --> TOT[Tree Search<br/>Tree of Thoughts]
ROOT --> REF[Reflection<br/>Reflexion]
ROOT --> PE[Plan-Execute]
ROOT --> RM[Reasoning Models<br/>o1/R1]
IO --> |Add intermediate steps| COT
COT --> |Add actions| REACT
COT --> |Add search| TOT
REACT --> |Add experience| REF
REACT --> |Separate planning| PE
COT --> |Internalize reasoning| RM
3.2 Method Comparison
| Method | Reasoning Depth | Search Width | Learning | Tool Use | Compute Cost |
|---|---|---|---|---|---|
| Standard | Single step | 1 | None | None | Low |
| CoT | Multi-step chain | 1 | None | None | Low-Medium |
| Self-Consistency | Multi-step chain | K paths | None | None | Medium |
| ReAct | Multi-step chain | 1 | None | Yes | Medium |
| ToT | Tree search | B branches | None | Optional | High |
| Reflexion | Multi-step chain | 1 | Verbal experience | Yes | Medium-High |
| Plan-Execute | Two-phase | 1 | Optional | Yes | Medium-High |
| o1/R1 | Deep chain | Internal search | At training time | Optional | High |
4. A Unified Perspective on Planning and Reasoning
From a mathematical perspective, all reasoning and planning methods can be viewed as finding an optimal path in some search space:
| Method | Search Space | Path | Evaluation Function |
|---|---|---|---|
| Classical Planning | State space | Action sequence | Goal satisfaction |
| MDP | State-action space | Policy | Expected cumulative reward |
| CoT | Token space | Reasoning chain | Output correctness |
| ToT | Thought node tree | Branch path | LLM evaluation score |
| MCTS | Game tree | Optimal move | UCB + Value estimate |
5. Chapter Contents Guide
| File | Topic | Core Question Addressed |
|---|---|---|
| Chain-of-Thought and Reasoning Patterns | CoT and its variants | How to make LLMs show their reasoning process? |
| ReAct and Tool Reasoning | Reasoning + Action | How to unify thinking and tool use? |
| Tree Search and Monte Carlo | Search methods | How to systematically explore the reasoning space? |
| Reflection and Self-Improvement | Experiential learning | How to learn and improve from failures? |
| Plan-Execute Frameworks | Engineering implementation | How to engineer planning capabilities? |
| Frontier Advances in Reasoning | Latest developments | Where are the limits of reasoning ability? |
References
- Fikes, R.E. & Nilsson, N.J. (1971). STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. Artificial Intelligence, 2(3-4), 189-208.
- Ghallab, M. et al. (2004). Automated Planning: Theory and Practice. Morgan Kaufmann.
- Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022.
- Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
- Huang, W. et al. (2022). Language Models as Zero-Shot Planners. ICML 2022.