Classical Planning & Decision-Making
Overview of AI Planning
What is AI Planning
AI Planning studies how agents can automatically generate action sequences to transition from an initial state to a goal state. Unlike generic search problems, planning requires explicit representations of world states, actions, and their effects.
The core planning problem can be formalized as:
where \(S_0\) is the initial state, \(G\) is the goal condition, and \(A\) is the set of available actions.
Taxonomy of Planning Problems
| Dimension | Simple | Complex |
|---|---|---|
| Observability | Fully observable | Partially observable |
| Determinism | Deterministic | Stochastic |
| Time | Discrete | Continuous |
| Agents | Single-agent | Multi-agent |
| Objective | Goal achievement | Objective function optimization |
Planning vs. Search
- Search: States are "black boxes" — only successor functions and goal tests are defined
- Planning: States have structured representations (propositions/predicates), actions have explicit preconditions and effects
This structured representation allows planners to exploit the internal structure of the problem for efficient solving, rather than searching blindly.
Application Domains
- Robotics: Task planning, motion planning, manipulation planning
- Autonomous Driving: Behavior decision-making, path planning, trajectory generation
- Game AI: NPC behavior, strategic planning
- Logistics & Scheduling: Supply chain optimization, resource allocation
- Aerospace: Satellite mission planning, Mars rover autonomous planning
- NLP: Dialogue management, story generation
Classical Planning Representations
Classical planning addresses the most fundamental planning setting: fully observable, deterministic, static, discrete, single-agent environments.
STRIPS Representation
STRIPS (Stanford Research Institute Problem Solver, 1971) is one of the earliest planning representation languages:
- State: A set of ground atoms (propositions)
- Action: Described by a triple \(\langle \text{Precondition}, \text{Add}, \text{Delete} \rangle\)
- Precondition: Conditions required before execution
- Add List: Propositions added after execution
- Delete List: Propositions removed after execution
Example: Blocks World
Action: Move(B, x, y)
Precondition: On(B, x) ∧ Clear(B) ∧ Clear(y)
Add: On(B, y) ∧ Clear(x)
Delete: On(B, x) ∧ Clear(y)
PDDL — Planning Domain Definition Language
PDDL is the standardized representation language for classical planning, separating problems into two parts:
- Domain file: Defines types, predicates, and action templates
- Problem file: Defines specific objects, initial state, and goals
;; Domain: Logistics
(:action drive-truck
:parameters (?truck ?from ?to ?city)
:precondition (and (truck ?truck) (location ?from)
(location ?to) (in-city ?from ?city)
(in-city ?to ?city) (at ?truck ?from))
:effect (and (at ?truck ?to) (not (at ?truck ?from))))
PDDL has undergone several extensions: PDDL 2.1 introduced temporal and numeric features, PDDL 3.0 added preferences and constraints.
State-Space Search
Planning problems can be converted to graph search problems:
Forward Search (Progression):
- Start from the initial state, progressively apply actions until the goal is met
- Large branching factor — requires good heuristics
Backward Search (Regression):
- Start from the goal, work backward to find which actions achieve subgoals
- Naturally considers only relevant actions, but must handle partial states
Planning Graphs
The Graphplan algorithm (Blum & Furst, 1997) introduced the planning graph data structure:
- Alternating proposition levels and action levels
- Each level records mutex relations — pairs of propositions/actions that cannot hold simultaneously
- Applications:
- Extracting valid plans
- Serving as heuristic functions to estimate goal distance (\(h^{\max}\), \(h^{\text{add}}\), \(h^{FF}\))
Heuristic Planning
The core idea in modern classical planners: automatically extract heuristic functions from the problem description.
Common heuristics:
| Heuristic | Method | Characteristics |
|---|---|---|
| \(h^{\max}\) | Maximum cost across subgoals | Admissible, but imprecise |
| \(h^{\text{add}}\) | Sum of subgoal costs | Inadmissible, but informative |
| \(h^{FF}\) | Planning graph ignoring delete effects | Balances accuracy and computation |
| LM-cut | Landmark-based cuts | Admissible, suitable for optimal planning |
Representative planners: Fast Downward, LAMA, FF Planner.
Partial-Order Planning
Unlike total-order planning, partial-order planning only constrains the ordering of actions with causal dependencies:
- More flexible, allows parallel execution
- Maintains causal links and threats
- Classic algorithm: POP (Partial-Order Planner)
Behavior Decision-Making
Behavior decision-making concerns how agents select high-level behavioral strategies in complex environments, serving as the bridge between perception and low-level control.
Finite State Machines (FSM)
The simplest behavior decision model, where agents switch among a finite set of states:
- States: Represent behavioral modes (e.g., patrol, chase, flee)
- Transitions: State switches triggered by conditions
- Pros: Intuitive, easy to implement and debug
- Cons: State explosion — \(n\) state variables yield \(O(2^n)\) combined states
Hierarchical FSMs (HFSM) reduce complexity via nesting, but still struggle with large-scale problems.
Behavior Trees (BT)
Behavior trees are a widely used behavior decision framework in game AI and robotics:
Core Node Types:
| Node Type | Function | Behavior |
|---|---|---|
| Sequence (→) | Sequential execution | Succeeds only if all children succeed; stops on failure |
| Selector (?) | Fallback execution | Stops on first success; fails only if all fail |
| Parallel (⇉) | Parallel execution | Success/failure determined by policy |
| Condition | Condition check | Returns success/failure |
| Action | Execute action | Returns success/failure/running |
| Decorator | Modify child node | Invert, repeat, timeout, etc. |
Pros: Modular, reusable, easy to extend and debug.
Comparison with FSMs: In behavior trees, state is implicit in the tree traversal position — no need to explicitly manage state transitions.
Hierarchical Task Networks (HTN)
HTN planning solves problems through task decomposition:
- Primitive Tasks: Directly executable actions
- Compound Tasks: Must be decomposed into subtasks
- Methods: Describe how to decompose compound tasks into subtask sequences
Task: Deliver(package, destination)
Method 1: [Pick(package), Drive(destination), Drop(package)]
Method 2: [Pick(package), Fly(destination), Drop(package)]
HTN leverages domain knowledge to reduce the search space, well-suited for highly structured applications (logistics, manufacturing).
Representative systems: SHOP2, SIPE-2.
Decision-Theoretic Planning
When environments are uncertain, probability and utility must be incorporated:
MDP (Markov Decision Process):
POMDP (Partially Observable MDP):
- The agent cannot directly observe the full state
- Maintains a belief state — a probability distribution over states
- Computationally intractable (PSPACE-complete), typically requiring approximate solutions
Game-Theoretic Planning
In multi-agent settings, one must account for other agents' behavior:
- Nash equilibrium: No agent has an incentive to unilaterally deviate
- Minimax strategy: Optimal strategy in zero-sum games
- Core challenge in multi-agent planning: coordination and communication
Optimization Methods in Planning
Many variants of planning problems can be formulated as mathematical optimization problems.
Linear Programming and Mixed-Integer Programming
Linear Programming (LP):
- Both objective function and constraints are linear
- Simplex method and Interior Point methods
- Applications: Resource allocation, network flows, relaxations of scheduling problems
Mixed-Integer Programming (MIP) — when some decision variables must take integer values:
- MILP (Mixed-Integer Linear Programming): NP-hard, but modern solvers (Gurobi, CPLEX) can handle considerable scale
- Applications: Obstacle avoidance (binary variables for which side of obstacles), task assignment (0-1 decisions), vehicle routing (TSP/VRP)
Nonlinear Programming and Convex Optimization
Nonlinear Programming (NLP):
Solution methods:
- Gradient Descent / Newton's Method: For unconstrained or simply constrained problems
- Sequential Quadratic Programming (SQP): Approximates the problem as QP at each step
- Interior Point Methods (IPM): Converts inequality constraints to barrier functions
- Augmented Lagrangian Method (ALM): Combines Lagrange multipliers with penalty terms
Convex Optimization — when \(f\) and the feasible region are both convex, local optima are global optima:
- Quadratic Programming (QP): \(\min \frac{1}{2}\mathbf{x}^T\mathbf{H}\mathbf{x} + \mathbf{f}^T\mathbf{x}\)
- Second-Order Cone Programming (SOCP): Constraints are second-order cones
- Semidefinite Programming (SDP): Constraints are positive semidefinite matrices
In trajectory optimization, many problems can be formulated or approximated as convex optimization.
Constraint Optimization in Planning
Task Assignment Problem — assign \(n\) tasks to \(m\) agents, minimizing total cost:
The Hungarian algorithm solves this in polynomial time.
Vehicle Routing Problem (VRP) — a generalization of TSP: multiple vehicles depart from a depot to serve all customers and return:
- Capacitated VRP (CVRP)
- VRP with Time Windows (VRPTW)
- Typically solved using MIP + heuristics (column generation, adaptive large neighborhood search)
Scheduling Problems — Job Shop Scheduling:
- \(n\) jobs, \(m\) machines, each job has operation precedence constraints
- Objective: Minimize makespan
- NP-hard; in practice solved by combining CP (Constraint Programming) and MIP
Key References
- Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach (AIMA) — Part III: Planning
- Dana Nau et al., Automated Planning: Theory and Practice
- Steven LaValle, Planning Algorithms