Task and Motion Planning Theory

Overview

Task and Motion Planning (TAMP) is one of the core problems in embodied intelligence: how to decompose high-level task goals into executable physical action sequences. This article systematically introduces the TAMP theoretical framework, including PDDL formalization, HTN hierarchical decomposition, PDDLStream sampling integration, and LLMs as a new paradigm for modern TAMP engines.

1. TAMP Problem Definition

1.1 Why TAMP Is Needed

Traditionally, task planning and motion planning were studied separately:

Task Planning: Reasoning about action sequences at the symbolic level (e.g., "first pick up the cup, then pour water")
Motion Planning: Generating collision-free trajectories at the geometric/physical level

However, in actual robot manipulation, the two are tightly coupled:

Some symbolically feasible plans are geometrically infeasible (e.g., an object is occluded and cannot be directly grasped)
Motion constraints influence task-level decisions (e.g., move an obstacle first)
Both discrete decisions (what to do) and continuous parameters (how to do it) must be considered simultaneously

1.2 TAMP Architecture

flowchart TB
    subgraph Task Layer
        A[Task Goal] --> B[Symbolic Planner]
        B --> C[Action Skeleton<br/>pick→place→pour]
    end
    subgraph Geometric Layer
        C --> D[Parameter Sampling]
        D --> E[Motion Planning]
        E --> F[Feasibility Check]
    end
    F -->|Infeasible| B
    F -->|Feasible| G[Execute Trajectory]

    style A fill:#e3f2fd
    style B fill:#f3e5f5
    style C fill:#fff8e1
    style D fill:#e8f5e9
    style E fill:#e8f5e9
    style F fill:#fce4ec

1.3 Formal Definition

A TAMP problem can be defined as a tuple \(\langle \mathcal{S}, \mathcal{A}, s_0, G, \mathcal{C} \rangle\):

\(\mathcal{S} = \mathcal{S}_{\text{discrete}} \times \mathcal{S}_{\text{continuous}}\): Hybrid state space
\(\mathcal{A}\): Action set, where each action has discrete and continuous parameters
\(s_0\): Initial state
\(G\): Goal condition
\(\mathcal{C}\): Geometric/physical constraints

2. PDDL: Planning Domain Definition Language

2.1 PDDL Basics

PDDL (Planning Domain Definition Language) is the standard formalization language for task planning, consisting of two parts:

Domain: Defines types, predicates, and action templates

(define (domain pick-and-place)
  (:predicates
    (on ?obj ?surface)
    (holding ?obj)
    (clear ?obj)
    (arm-empty)
  )

  (:action pick
    :parameters (?obj ?surface)
    :precondition (and 
      (on ?obj ?surface) 
      (clear ?obj) 
      (arm-empty))
    :effect (and 
      (holding ?obj) 
      (not (on ?obj ?surface)) 
      (not (arm-empty))
      (clear ?surface))
  )

  (:action place
    :parameters (?obj ?surface)
    :precondition (and 
      (holding ?obj) 
      (clear ?surface))
    :effect (and 
      (on ?obj ?surface) 
      (not (holding ?obj)) 
      (arm-empty)
      (not (clear ?surface)))
  )
)

Problem: Defines specific objects, initial state, and goals

(define (problem example)
  (:domain pick-and-place)
  (:objects cup plate table counter)
  (:init 
    (on cup table) 
    (on plate counter) 
    (clear cup) 
    (clear plate) 
    (arm-empty))
  (:goal (and 
    (on cup counter) 
    (on plate table)))
)

2.2 Action Schema

Each action schema contains:

\[\text{Action} = \langle \text{name}, \text{params}, \text{pre}, \text{eff}^+, \text{eff}^- \rangle\]

Preconditions: Conditions that must be satisfied before execution
Positive Effects: Predicates that become true after execution
Negative Effects: Predicates that become false after execution

2.3 Planning Algorithms

Classical planning can be viewed as a graph search problem:

\[f(n) = g(n) + h(n)\]

where:

\(g(n)\): Cost from the initial state to node \(n\)
\(h(n)\): Heuristic estimate from \(n\) to the goal

Common Heuristics:

Heuristic	Definition	Characteristics
\(h_{\text{add}}\)	Assumes subgoals are independent, sums individual costs	Not admissible, but informative
\(h_{\text{max}}\)	Takes maximum of individual subgoal costs	Admissible, but not tight
\(h_{\text{FF}}\)	Cost estimate from relaxed graph (ignoring delete effects)	Commonly used, good performance
\(h_{\text{LM-cut}}\)	Optimal delete-relaxation based on landmarks	Admissible, very tight

Major Planners:

Fast Downward: Supports multiple heuristics and search strategies
LAMA: Multi-heuristic lazy search
Pyperplan: Python implementation, suitable for teaching

3. PDDLStream: Sampling Integration

3.1 Limitations of PDDL

Standard PDDL assumes all parameters come from finite discrete sets, but robot manipulation involves continuous parameters:

Grasp pose \(T_{\text{grasp}} \in SE(3)\)
Placement position \((x, y, z) \in \mathbb{R}^3\)
Motion trajectory \(\tau: [0,1] \rightarrow \mathcal{C}\)

3.2 The PDDLStream Framework

Garrett et al. (2020) proposed PDDLStream, integrating continuous parameter sampling into symbolic planning through streams:

Stream Definition:

(:stream sample-grasp
  :inputs (?obj)
  :domain (Graspable ?obj)
  :outputs (?grasp)
  :certified (GraspPose ?obj ?grasp)
)

(:stream plan-motion
  :inputs (?q1 ?q2)
  :domain (and (Config ?q1) (Config ?q2))
  :outputs (?traj)
  :certified (and (Motion ?q1 ?traj ?q2) (Collision-free ?traj))
)

Workflow:

The symbolic planner generates an abstract plan skeleton
For continuous parameters in the skeleton, corresponding streams are called to sample
Check whether the sampled results satisfy geometric constraints
If not, backtrack and try a different skeleton or sampling

3.3 Sampling Strategies

Adaptive Sampling: Prioritize sampling of more likely successful schemes

\[p(\text{sample}) \propto \exp\left(-\frac{\text{failure\_count}}{\tau}\right)\]

Learning-Based Sampling: Use neural networks to learn sampling distributions

\[T_{\text{grasp}} \sim q_\phi(T | o_{\text{obj}}, s_{\text{scene}})\]

4. Hierarchical Task Networks (HTN)

4.1 Basic Concepts

HTN (Hierarchical Task Network) progressively refines complex tasks through hierarchical decomposition:

Core Idea:

\[\text{Abstract Task} \xrightarrow{\text{Decomposition Method}} \text{Subtask Sequence} \xrightarrow{\text{Recursive Decomposition}} \text{Primitive Actions}\]

4.2 Formalization

An HTN domain is defined as \(\langle \mathcal{O}, \mathcal{M} \rangle\):

\(\mathcal{O}\): Primitive operators -- directly executable actions
\(\mathcal{M}\): Methods -- define how to decompose compound tasks

Method Definition:

\[m = \langle \text{task}, \text{precondition}, \text{subtasks} \rangle\]

Example:

Method: make-coffee
  Task: (prepare-drink coffee)
  Precondition: (and (have coffee-beans) (have water) (have cup))
  Subtasks: 
    1. (grind coffee-beans)
    2. (boil water)
    3. (brew ground-coffee hot-water)
    4. (pour coffee cup)

4.3 HTN vs. Classical Planning

Dimension	Classical Planning (STRIPS/PDDL)	HTN
Goal Expression	Goal state predicates	Tasks to be completed
Search Space	State space	Decomposition space
Domain Knowledge	Action schemas only	Decomposition methods (more prior knowledge)
Plan Quality	Can guarantee optimality (some heuristics)	Depends on method quality
Computational Efficiency	Can be slow	Accelerated by domain knowledge

5. LLMs as Modern TAMP

5.1 Zero-Shot Task Decomposition with LLMs

Large language models naturally possess task decomposition capabilities:

Traditional TAMP: Requires manually defined domains and methods

LLM-TAMP: Directly generates task plans from natural language instructions

User: "Help me make a cup of coffee"
LLM: 
1. Walk to the kitchen counter
2. Pick up the coffee cup
3. Place it under the coffee machine
4. Press the coffee machine button
5. Wait for the coffee to brew
6. Pick up the coffee cup
7. Bring it to the user

5.2 Representative Works

SayCan (Ahn et al., 2022)

LLM provides a semantic score, robot skills provide a feasibility score:

\[a^* = \arg\max_a \underbrace{p_{\text{LLM}}(\text{useful} | a, l)}_{\text{Semantic Score}} \cdot \underbrace{p_{\text{affordance}}(\text{possible} | a, s)}_{\text{Feasibility Score}}\]

Code as Policies (Liang et al., 2023)

LLM directly generates executable Python code:

# LLM-generated policy code
def pour_water_into_cup():
    bottle = detect("water bottle")
    cup = detect("cup")
    grasp(bottle)
    move_to(above(cup))
    tilt(angle=90, duration=3)
    place(bottle, on=table)

Inner Monologue (Huang et al., 2022)

Closed-loop LLM planning with environmental feedback:

LLM: Pick up the red cup
Environment Feedback: Grasp failed, cup is behind an obstacle
LLM: First move the obstacle aside
Environment Feedback: Successfully moved
LLM: Now pick up the red cup

5.3 LLM-TAMP vs. Traditional TAMP

Dimension	Traditional TAMP	LLM-TAMP
Domain Definition	Requires manual PDDL	Natural language suffices
Generalization	Limited to predefined domains	Open-vocabulary, zero-shot
Reliability	Formal guarantees	May hallucinate
Optimality	Can be guaranteed	Not guaranteed
Physical Feasibility	Verified through geometric checks	Requires additional verification
Commonsense Reasoning	Limited	Powerful

5.4 Integration Approaches

The latest research attempts to combine the advantages of both:

LLM + PDDL: LLM generates PDDL domain definitions; traditional planners solve
LLM + Verifier: LLM generates plans; formal tools verify feasibility
LLM + Constraint Solver: LLM provides high-level structure; constraint solvers handle details

6. Motion Planning Fundamentals

6.1 Configuration Space

Motion planning is performed in the configuration space \(\mathcal{C}\):

\(\mathcal{C}_{\text{free}}\): Free space (collision-free)
\(\mathcal{C}_{\text{obs}}\): Obstacle space
Goal: Find a path from \(q_{\text{start}}\) to \(q_{\text{goal}}\) in \(\mathcal{C}_{\text{free}}\)

6.2 Sampling-Based Planning

RRT (Rapidly-exploring Random Tree):

Algorithm RRT:
1. Initialize tree T = {q_start}
2. Repeat:
   a. Randomly sample q_rand in C
   b. Find nearest node q_near in T
   c. Extend toward q_rand by step Δq to get q_new
   d. If edge(q_near, q_new) is in C_free:
      Add q_new to T
3. Until q_goal is connected

RRT* (Asymptotically Optimal):

Adds a rewire step to RRT, guaranteeing that as the number of samples increases, the path cost converges to the optimum:

\[\lim_{n \to \infty} \text{cost}(\text{RRT}^*) = \text{cost}^*\]

6.3 Trajectory Optimization

Formulating motion planning as an optimization problem:

\[\min_\tau \int_0^T \left[ \|\ddot{q}(t)\|^2 + \lambda \cdot c_{\text{obs}}(q(t)) \right] dt\]

\[\text{s.t.} \quad q(0) = q_{\text{start}}, \quad q(T) = q_{\text{goal}}, \quad q(t) \in \mathcal{C}_{\text{free}}\]

Representative Methods:

CHOMP: Covariant Hamiltonian optimization
TrajOpt: Sequential convex optimization
STOMP: Stochastic trajectory optimization

7. Open Challenges in TAMP

7.1 Scalability

Combinatorial explosion as the number of objects increases
Planning efficiency for long-horizon tasks
Multi-robot coordinated planning

7.2 Handling Uncertainty

Perception uncertainty: Object pose estimation errors
Execution uncertainty: Imprecise action effects
Environmental uncertainty: Dynamically changing environments

7.3 Integrating Learning and Planning

Learned heuristics to accelerate search
Learning decomposition methods from experience
Neurosymbolic methods: Continuous representations + discrete reasoning

References

Garrett, C. R. et al. (2020). "PDDLStream: Integrating Symbolic Planners and Blackbox Samplers"
Ahn, M. et al. (2022). "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances"
Liang, J. et al. (2023). "Code as Policies: Language Model Programs for Embodied Control"
Huang, W. et al. (2022). "Inner Monologue: Embodied Reasoning through Planning with Language Models"
LaValle, S. M. (2006). Planning Algorithms
Ghallab, M. et al. (2004). Automated Planning: Theory and Practice

Related Notes: