Skip to content

Planning and Reasoning Survey

Overview

Planning and reasoning are the core cognitive capabilities of agents. From the STRIPS planner of the 1960s to the LLM reasoning models of the 2020s, the question of "how agents think and formulate action plans" has undergone fundamental paradigm shifts. This article traces this evolution and establishes a framework for understanding the subsequent specialized topics.


1. Classical Planning

1.1 STRIPS Representation

STRIPS (Stanford Research Institute Problem Solver, 1971) defined the standard framework for classical planning:

Planning Problem \(\mathcal{P} = \langle S, A, s_0, G \rangle\):

  • \(S\): Set of states (power set of propositions)
  • \(A\): Set of actions
  • \(s_0 \in S\): Initial state
  • \(G \subseteq s\): Goal conditions

Action Definition \(a = \langle \text{Pre}(a), \text{Add}(a), \text{Del}(a) \rangle\):

  • \(\text{Pre}(a)\): Preconditions (conditions that must hold to execute \(a\))
  • \(\text{Add}(a)\): Add effects (facts added after execution)
  • \(\text{Del}(a)\): Delete effects (facts removed after execution)

State Transition:

\[ \text{Result}(s, a) = (s \setminus \text{Del}(a)) \cup \text{Add}(a) \quad \text{if } \text{Pre}(a) \subseteq s \]

1.2 PDDL

PDDL (Planning Domain Definition Language) is a standardized extension of STRIPS supporting types, quantifiers, conditional effects, and more:

(define (domain logistics)
  (:predicates (at ?obj ?loc) (in ?obj ?vehicle))
  (:action load
    :parameters (?obj ?vehicle ?loc)
    :precondition (and (at ?obj ?loc) (at ?vehicle ?loc))
    :effect (and (in ?obj ?vehicle) (not (at ?obj ?loc)))))

1.3 Classical Planning Algorithms

Algorithm Type Complexity Characteristics
Forward Search State space Exponential Searches from initial state to goal
Backward Search State space Exponential Regresses from goal to initial state
GraphPlan Graph search Polynomial graph construction Builds planning graph then extracts solution
SAT Planning Compilation SAT complexity Converts to Boolean satisfiability problem
FF Heuristic search Heuristic-guided Relaxed problem as heuristic function
HTN Hierarchical Domain-dependent Decomposes tasks into subtasks

2. Paradigm Shift from Classical to Modern

graph LR
    A[STRIPS/PDDL<br/>Symbolic Planning<br/>1970s-2000s] --> B[MDP/POMDP<br/>Probabilistic Planning<br/>1990s-2010s]
    B --> C[Deep RL<br/>Learned Planning<br/>2013-2020]
    C --> D[LLM Reasoning<br/>Language Planning<br/>2022-]
    D --> E[Reasoning Models<br/>o1/R1<br/>2024-]

2.1 Why LLMs Changed Everything

Dimension Classical Planning LLM Planning
Knowledge source Manually coded domain models World knowledge from pre-training corpora
State representation Formal logical propositions Natural language descriptions
Action definition PDDL operators Natural language instructions + Tool APIs
Goal representation Logical formulae Natural language task descriptions
Search algorithm Complete search Autoregressive generation + Sampling
Generalization In-domain complete Cross-domain but unreliable
Failure mode Search timeout Hallucination, omission, loops

Key LLM Advantage: No predefined domain model needed; can directly generate action plans from natural language task descriptions.

Key LLM Weakness: Lacks correctness guarantees; cannot prove plan optimality or completeness like classical planners.


3. Core Methods for LLM Reasoning

3.1 Method Taxonomy

graph TD
    ROOT[LLM Reasoning Methods] --> IO[Direct Input-Output<br/>Standard Prompting]
    ROOT --> COT[Chain-of-Thought]
    ROOT --> REACT[ReAct<br/>Reasoning + Acting]
    ROOT --> TOT[Tree Search<br/>Tree of Thoughts]
    ROOT --> REF[Reflection<br/>Reflexion]
    ROOT --> PE[Plan-Execute]
    ROOT --> RM[Reasoning Models<br/>o1/R1]

    IO --> |Add intermediate steps| COT
    COT --> |Add actions| REACT
    COT --> |Add search| TOT
    REACT --> |Add experience| REF
    REACT --> |Separate planning| PE
    COT --> |Internalize reasoning| RM

3.2 Method Comparison

Method Reasoning Depth Search Width Learning Tool Use Compute Cost
Standard Single step 1 None None Low
CoT Multi-step chain 1 None None Low-Medium
Self-Consistency Multi-step chain K paths None None Medium
ReAct Multi-step chain 1 None Yes Medium
ToT Tree search B branches None Optional High
Reflexion Multi-step chain 1 Verbal experience Yes Medium-High
Plan-Execute Two-phase 1 Optional Yes Medium-High
o1/R1 Deep chain Internal search At training time Optional High

4. A Unified Perspective on Planning and Reasoning

From a mathematical perspective, all reasoning and planning methods can be viewed as finding an optimal path in some search space:

\[ \pi^* = \arg\max_{\pi \in \Pi} \sum_{t=0}^{T} R(s_t, a_t) \]
Method Search Space Path Evaluation Function
Classical Planning State space Action sequence Goal satisfaction
MDP State-action space Policy Expected cumulative reward
CoT Token space Reasoning chain Output correctness
ToT Thought node tree Branch path LLM evaluation score
MCTS Game tree Optimal move UCB + Value estimate

5. Chapter Contents Guide

File Topic Core Question Addressed
Chain-of-Thought and Reasoning Patterns CoT and its variants How to make LLMs show their reasoning process?
ReAct and Tool Reasoning Reasoning + Action How to unify thinking and tool use?
Tree Search and Monte Carlo Search methods How to systematically explore the reasoning space?
Reflection and Self-Improvement Experiential learning How to learn and improve from failures?
Plan-Execute Frameworks Engineering implementation How to engineer planning capabilities?
Frontier Advances in Reasoning Latest developments Where are the limits of reasoning ability?

References

  1. Fikes, R.E. & Nilsson, N.J. (1971). STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. Artificial Intelligence, 2(3-4), 189-208.
  2. Ghallab, M. et al. (2004). Automated Planning: Theory and Practice. Morgan Kaufmann.
  3. Wei, J. et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. NeurIPS 2022.
  4. Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
  5. Huang, W. et al. (2022). Language Models as Zero-Shot Planners. ICML 2022.

评论 #