Classical Planning & Decision-Making

Overview of AI Planning

What is AI Planning

AI Planning studies how agents can automatically generate action sequences to transition from an initial state to a goal state. Unlike generic search problems, planning requires explicit representations of world states, actions, and their effects.

The core planning problem can be formalized as:

\[\text{Given } \langle S_0, G, A \rangle, \text{ find action sequence } a_1, a_2, \ldots, a_n \text{ such that } S_0 \xrightarrow{a_1} S_1 \xrightarrow{a_2} \cdots \xrightarrow{a_n} S_n \models G\]

where \(S_0\) is the initial state, \(G\) is the goal condition, and \(A\) is the set of available actions.

Taxonomy of Planning Problems

Dimension	Simple	Complex
Observability	Fully observable	Partially observable
Determinism	Deterministic	Stochastic
Time	Discrete	Continuous
Agents	Single-agent	Multi-agent
Objective	Goal achievement	Objective function optimization

Planning vs. Search

Search: States are "black boxes" — only successor functions and goal tests are defined
Planning: States have structured representations (propositions/predicates), actions have explicit preconditions and effects

This structured representation allows planners to exploit the internal structure of the problem for efficient solving, rather than searching blindly.

Application Domains

Robotics: Task planning, motion planning, manipulation planning
Autonomous Driving: Behavior decision-making, path planning, trajectory generation
Game AI: NPC behavior, strategic planning
Logistics & Scheduling: Supply chain optimization, resource allocation
Aerospace: Satellite mission planning, Mars rover autonomous planning
NLP: Dialogue management, story generation

Classical Planning Representations

Classical planning addresses the most fundamental planning setting: fully observable, deterministic, static, discrete, single-agent environments.

STRIPS Representation

STRIPS (Stanford Research Institute Problem Solver, 1971) is one of the earliest planning representation languages:

State: A set of ground atoms (propositions)
Action: Described by a triple \(\langle \text{Precondition}, \text{Add}, \text{Delete} \rangle\)
- Precondition: Conditions required before execution
- Add List: Propositions added after execution
- Delete List: Propositions removed after execution

Example: Blocks World

Action: Move(B, x, y)
  Precondition: On(B, x) ∧ Clear(B) ∧ Clear(y)
  Add: On(B, y) ∧ Clear(x)
  Delete: On(B, x) ∧ Clear(y)

PDDL — Planning Domain Definition Language

PDDL is the standardized representation language for classical planning, separating problems into two parts:

Domain file: Defines types, predicates, and action templates
Problem file: Defines specific objects, initial state, and goals

;; Domain: Logistics
(:action drive-truck
  :parameters (?truck ?from ?to ?city)
  :precondition (and (truck ?truck) (location ?from)
                     (location ?to) (in-city ?from ?city)
                     (in-city ?to ?city) (at ?truck ?from))
  :effect (and (at ?truck ?to) (not (at ?truck ?from))))

PDDL has undergone several extensions: PDDL 2.1 introduced temporal and numeric features, PDDL 3.0 added preferences and constraints.

State-Space Search

Planning problems can be converted to graph search problems:

Forward Search (Progression):

Start from the initial state, progressively apply actions until the goal is met
Large branching factor — requires good heuristics

Backward Search (Regression):

Start from the goal, work backward to find which actions achieve subgoals
Naturally considers only relevant actions, but must handle partial states

Planning Graphs

The Graphplan algorithm (Blum & Furst, 1997) introduced the planning graph data structure:

Alternating proposition levels and action levels
Each level records mutex relations — pairs of propositions/actions that cannot hold simultaneously
Applications:
- Extracting valid plans
- Serving as heuristic functions to estimate goal distance (\(h^{\max}\), \(h^{\text{add}}\), \(h^{FF}\))

Heuristic Planning

The core idea in modern classical planners: automatically extract heuristic functions from the problem description.

Common heuristics:

Heuristic	Method	Characteristics
\(h^{\max}\)	Maximum cost across subgoals	Admissible, but imprecise
\(h^{\text{add}}\)	Sum of subgoal costs	Inadmissible, but informative
\(h^{FF}\)	Planning graph ignoring delete effects	Balances accuracy and computation
LM-cut	Landmark-based cuts	Admissible, suitable for optimal planning

Representative planners: Fast Downward, LAMA, FF Planner.

Partial-Order Planning

Unlike total-order planning, partial-order planning only constrains the ordering of actions with causal dependencies:

More flexible, allows parallel execution
Maintains causal links and threats
Classic algorithm: POP (Partial-Order Planner)

Behavior Decision-Making

Behavior decision-making concerns how agents select high-level behavioral strategies in complex environments, serving as the bridge between perception and low-level control.

Finite State Machines (FSM)

The simplest behavior decision model, where agents switch among a finite set of states:

States: Represent behavioral modes (e.g., patrol, chase, flee)
Transitions: State switches triggered by conditions
Pros: Intuitive, easy to implement and debug
Cons: State explosion — \(n\) state variables yield \(O(2^n)\) combined states

Hierarchical FSMs (HFSM) reduce complexity via nesting, but still struggle with large-scale problems.

Behavior Trees (BT)

Behavior trees are a widely used behavior decision framework in game AI and robotics:

Core Node Types:

Node Type	Function	Behavior
Sequence (→)	Sequential execution	Succeeds only if all children succeed; stops on failure
Selector (?)	Fallback execution	Stops on first success; fails only if all fail
Parallel (⇉)	Parallel execution	Success/failure determined by policy
Condition	Condition check	Returns success/failure
Action	Execute action	Returns success/failure/running
Decorator	Modify child node	Invert, repeat, timeout, etc.

Pros: Modular, reusable, easy to extend and debug.

Comparison with FSMs: In behavior trees, state is implicit in the tree traversal position — no need to explicitly manage state transitions.

Hierarchical Task Networks (HTN)

HTN planning solves problems through task decomposition:

Primitive Tasks: Directly executable actions
Compound Tasks: Must be decomposed into subtasks
Methods: Describe how to decompose compound tasks into subtask sequences

Task: Deliver(package, destination)
  Method 1: [Pick(package), Drive(destination), Drop(package)]
  Method 2: [Pick(package), Fly(destination), Drop(package)]

HTN leverages domain knowledge to reduce the search space, well-suited for highly structured applications (logistics, manufacturing).

Representative systems: SHOP2, SIPE-2.

Decision-Theoretic Planning

When environments are uncertain, probability and utility must be incorporated:

MDP (Markov Decision Process):

\[V^*(s) = \max_a \left[ R(s, a) + \gamma \sum_{s'} T(s, a, s') V^*(s') \right]\]

POMDP (Partially Observable MDP):

The agent cannot directly observe the full state
Maintains a belief state — a probability distribution over states
Computationally intractable (PSPACE-complete), typically requiring approximate solutions

Game-Theoretic Planning

In multi-agent settings, one must account for other agents' behavior:

Nash equilibrium: No agent has an incentive to unilaterally deviate
Minimax strategy: Optimal strategy in zero-sum games
Core challenge in multi-agent planning: coordination and communication

Optimization Methods in Planning

Many variants of planning problems can be formulated as mathematical optimization problems.

Linear Programming and Mixed-Integer Programming

Linear Programming (LP):

\[\min \mathbf{c}^T \mathbf{x} \quad \text{s.t.} \quad \mathbf{A}\mathbf{x} \leq \mathbf{b}, \quad \mathbf{x} \geq 0\]

Both objective function and constraints are linear
Simplex method and Interior Point methods
Applications: Resource allocation, network flows, relaxations of scheduling problems

Mixed-Integer Programming (MIP) — when some decision variables must take integer values:

\[\min \mathbf{c}^T \mathbf{x} + \mathbf{d}^T \mathbf{y} \quad \text{s.t.} \quad \mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{y} \leq \mathbf{b}, \quad \mathbf{y} \in \mathbb{Z}^p\]

MILP (Mixed-Integer Linear Programming): NP-hard, but modern solvers (Gurobi, CPLEX) can handle considerable scale
Applications: Obstacle avoidance (binary variables for which side of obstacles), task assignment (0-1 decisions), vehicle routing (TSP/VRP)

Nonlinear Programming and Convex Optimization

Nonlinear Programming (NLP):

\[\min f(\mathbf{x}) \quad \text{s.t.} \quad g_i(\mathbf{x}) \leq 0, \quad h_j(\mathbf{x}) = 0\]

Solution methods:

Gradient Descent / Newton's Method: For unconstrained or simply constrained problems
Sequential Quadratic Programming (SQP): Approximates the problem as QP at each step
Interior Point Methods (IPM): Converts inequality constraints to barrier functions
Augmented Lagrangian Method (ALM): Combines Lagrange multipliers with penalty terms

Convex Optimization — when \(f\) and the feasible region are both convex, local optima are global optima:

Quadratic Programming (QP): \(\min \frac{1}{2}\mathbf{x}^T\mathbf{H}\mathbf{x} + \mathbf{f}^T\mathbf{x}\)
Second-Order Cone Programming (SOCP): Constraints are second-order cones
Semidefinite Programming (SDP): Constraints are positive semidefinite matrices

In trajectory optimization, many problems can be formulated or approximated as convex optimization.

Constraint Optimization in Planning

Task Assignment Problem — assign \(n\) tasks to \(m\) agents, minimizing total cost:

\[\min \sum_{i,j} c_{ij} x_{ij} \quad \text{s.t.} \quad \sum_j x_{ij} = 1, \quad x_{ij} \in \{0, 1\}\]

The Hungarian algorithm solves this in polynomial time.

Vehicle Routing Problem (VRP) — a generalization of TSP: multiple vehicles depart from a depot to serve all customers and return:

Capacitated VRP (CVRP)
VRP with Time Windows (VRPTW)
Typically solved using MIP + heuristics (column generation, adaptive large neighborhood search)

Scheduling Problems — Job Shop Scheduling:

\(n\) jobs, \(m\) machines, each job has operation precedence constraints
Objective: Minimize makespan
NP-hard; in practice solved by combining CP (Constraint Programming) and MIP

Key References

Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach (AIMA) — Part III: Planning
Dana Nau et al., Automated Planning: Theory and Practice
Steven LaValle, Planning Algorithms