Planning and Reasoning Survey

Overview

Planning and reasoning are the core cognitive capabilities of agents. From the STRIPS planner of the 1960s to the LLM reasoning models of the 2020s, the question of "how agents think and formulate action plans" has undergone fundamental paradigm shifts. This article traces this evolution and establishes a framework for understanding the subsequent specialized topics.

1. Classical Planning

1.1 STRIPS Representation

STRIPS (Stanford Research Institute Problem Solver, 1971) defined the standard framework for classical planning:

Planning Problem \(\mathcal{P} = \langle S, A, s_0, G \rangle\):

\(S\): Set of states (power set of propositions)
\(A\): Set of actions
\(s_0 \in S\): Initial state
\(G \subseteq s\): Goal conditions

Action Definition \(a = \langle \text{Pre}(a), \text{Add}(a), \text{Del}(a) \rangle\):

\(\text{Pre}(a)\): Preconditions (conditions that must hold to execute \(a\))
\(\text{Add}(a)\): Add effects (facts added after execution)
\(\text{Del}(a)\): Delete effects (facts removed after execution)

State Transition:

\[ \text{Result}(s, a) = (s \setminus \text{Del}(a)) \cup \text{Add}(a) \quad \text{if } \text{Pre}(a) \subseteq s \]

1.2 PDDL

PDDL (Planning Domain Definition Language) is a standardized extension of STRIPS supporting types, quantifiers, conditional effects, and more:

(define (domain logistics)
  (:predicates (at ?obj ?loc) (in ?obj ?vehicle))
  (:action load
    :parameters (?obj ?vehicle ?loc)
    :precondition (and (at ?obj ?loc) (at ?vehicle ?loc))
    :effect (and (in ?obj ?vehicle) (not (at ?obj ?loc)))))

1.3 Classical Planning Algorithms

Algorithm	Type	Complexity	Characteristics
Forward Search	State space	Exponential	Searches from initial state to goal
Backward Search	State space	Exponential	Regresses from goal to initial state
GraphPlan	Graph search	Polynomial graph construction	Builds planning graph then extracts solution
SAT Planning	Compilation	SAT complexity	Converts to Boolean satisfiability problem
FF	Heuristic search	Heuristic-guided	Relaxed problem as heuristic function
HTN	Hierarchical	Domain-dependent	Decomposes tasks into subtasks

2. Paradigm Shift from Classical to Modern

graph LR
    A[STRIPS/PDDL<br/>Symbolic Planning<br/>1970s-2000s] --> B[MDP/POMDP<br/>Probabilistic Planning<br/>1990s-2010s]
    B --> C[Deep RL<br/>Learned Planning<br/>2013-2020]
    C --> D[LLM Reasoning<br/>Language Planning<br/>2022-]
    D --> E[Reasoning Models<br/>o1/R1<br/>2024-]

2.1 Why LLMs Changed Everything

Dimension	Classical Planning	LLM Planning
Knowledge source	Manually coded domain models	World knowledge from pre-training corpora
State representation	Formal logical propositions	Natural language descriptions
Action definition	PDDL operators	Natural language instructions + Tool APIs
Goal representation	Logical formulae	Natural language task descriptions
Search algorithm	Complete search	Autoregressive generation + Sampling
Generalization	In-domain complete	Cross-domain but unreliable
Failure mode	Search timeout	Hallucination, omission, loops

Key LLM Advantage: No predefined domain model needed; can directly generate action plans from natural language task descriptions.

Key LLM Weakness: Lacks correctness guarantees; cannot prove plan optimality or completeness like classical planners.

3. Core Methods for LLM Reasoning

3.1 Method Taxonomy

graph TD
    ROOT[LLM Reasoning Methods] --> IO[Direct Input-Output<br/>Standard Prompting]
    ROOT --> COT[Chain-of-Thought]
    ROOT --> REACT[ReAct<br/>Reasoning + Acting]
    ROOT --> TOT[Tree Search<br/>Tree of Thoughts]
    ROOT --> REF[Reflection<br/>Reflexion]
    ROOT --> PE[Plan-Execute]
    ROOT --> RM[Reasoning Models<br/>o1/R1]

    IO --> |Add intermediate steps| COT
    COT --> |Add actions| REACT
    COT --> |Add search| TOT
    REACT --> |Add experience| REF
    REACT --> |Separate planning| PE
    COT --> |Internalize reasoning| RM

3.2 Method Comparison

Method	Reasoning Depth	Search Width	Learning	Tool Use	Compute Cost
Standard	Single step	1	None	None	Low
CoT	Multi-step chain	1	None	None	Low-Medium
Self-Consistency	Multi-step chain	K paths	None	None	Medium
ReAct	Multi-step chain	1	None	Yes	Medium
ToT	Tree search	B branches	None	Optional	High
Reflexion	Multi-step chain	1	Verbal experience	Yes	Medium-High
Plan-Execute	Two-phase	1	Optional	Yes	Medium-High
o1/R1	Deep chain	Internal search	At training time	Optional	High

4. A Unified Perspective on Planning and Reasoning

From a mathematical perspective, all reasoning and planning methods can be viewed as finding an optimal path in some search space:

\[ \pi^* = \arg\max_{\pi \in \Pi} \sum_{t=0}^{T} R(s_t, a_t) \]

Method	Search Space	Path	Evaluation Function
Classical Planning	State space	Action sequence	Goal satisfaction
MDP	State-action space	Policy	Expected cumulative reward
CoT	Token space	Reasoning chain	Output correctness
ToT	Thought node tree	Branch path	LLM evaluation score
MCTS	Game tree	Optimal move	UCB + Value estimate

5. Chapter Contents Guide

File	Topic	Core Question Addressed
Chain-of-Thought and Reasoning Patterns	CoT and its variants	How to make LLMs show their reasoning process?
ReAct and Tool Reasoning	Reasoning + Action	How to unify thinking and tool use?
Tree Search and Monte Carlo	Search methods	How to systematically explore the reasoning space?
Reflection and Self-Improvement	Experiential learning	How to learn and improve from failures?
Plan-Execute Frameworks	Engineering implementation	How to engineer planning capabilities?
Frontier Advances in Reasoning	Latest developments	Where are the limits of reasoning ability?

References

Fikes, R.E. & Nilsson, N.J. (1971). STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. Artificial Intelligence, 2(3-4), 189-208.
Ghallab, M. et al. (2004). Automated Planning: Theory and Practice. Morgan Kaufmann.
Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022.
Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
Huang, W. et al. (2022). Language Models as Zero-Shot Planners. ICML 2022.