Task and Motion Planning Theory
Overview
Task and Motion Planning (TAMP) is one of the core problems in embodied intelligence: how to decompose high-level task goals into executable physical action sequences. This article systematically introduces the TAMP theoretical framework, including PDDL formalization, HTN hierarchical decomposition, PDDLStream sampling integration, and LLMs as a new paradigm for modern TAMP engines.
1. TAMP Problem Definition
1.1 Why TAMP Is Needed
Traditionally, task planning and motion planning were studied separately:
- Task Planning: Reasoning about action sequences at the symbolic level (e.g., "first pick up the cup, then pour water")
- Motion Planning: Generating collision-free trajectories at the geometric/physical level
However, in actual robot manipulation, the two are tightly coupled:
- Some symbolically feasible plans are geometrically infeasible (e.g., an object is occluded and cannot be directly grasped)
- Motion constraints influence task-level decisions (e.g., move an obstacle first)
- Both discrete decisions (what to do) and continuous parameters (how to do it) must be considered simultaneously
1.2 TAMP Architecture
flowchart TB
subgraph Task Layer
A[Task Goal] --> B[Symbolic Planner]
B --> C[Action Skeleton<br/>pick→place→pour]
end
subgraph Geometric Layer
C --> D[Parameter Sampling]
D --> E[Motion Planning]
E --> F[Feasibility Check]
end
F -->|Infeasible| B
F -->|Feasible| G[Execute Trajectory]
style A fill:#e3f2fd
style B fill:#f3e5f5
style C fill:#fff8e1
style D fill:#e8f5e9
style E fill:#e8f5e9
style F fill:#fce4ec
1.3 Formal Definition
A TAMP problem can be defined as a tuple \(\langle \mathcal{S}, \mathcal{A}, s_0, G, \mathcal{C} \rangle\):
- \(\mathcal{S} = \mathcal{S}_{\text{discrete}} \times \mathcal{S}_{\text{continuous}}\): Hybrid state space
- \(\mathcal{A}\): Action set, where each action has discrete and continuous parameters
- \(s_0\): Initial state
- \(G\): Goal condition
- \(\mathcal{C}\): Geometric/physical constraints
2. PDDL: Planning Domain Definition Language
2.1 PDDL Basics
PDDL (Planning Domain Definition Language) is the standard formalization language for task planning, consisting of two parts:
Domain: Defines types, predicates, and action templates
(define (domain pick-and-place)
(:predicates
(on ?obj ?surface)
(holding ?obj)
(clear ?obj)
(arm-empty)
)
(:action pick
:parameters (?obj ?surface)
:precondition (and
(on ?obj ?surface)
(clear ?obj)
(arm-empty))
:effect (and
(holding ?obj)
(not (on ?obj ?surface))
(not (arm-empty))
(clear ?surface))
)
(:action place
:parameters (?obj ?surface)
:precondition (and
(holding ?obj)
(clear ?surface))
:effect (and
(on ?obj ?surface)
(not (holding ?obj))
(arm-empty)
(not (clear ?surface)))
)
)
Problem: Defines specific objects, initial state, and goals
(define (problem example)
(:domain pick-and-place)
(:objects cup plate table counter)
(:init
(on cup table)
(on plate counter)
(clear cup)
(clear plate)
(arm-empty))
(:goal (and
(on cup counter)
(on plate table)))
)
2.2 Action Schema
Each action schema contains:
- Preconditions: Conditions that must be satisfied before execution
- Positive Effects: Predicates that become true after execution
- Negative Effects: Predicates that become false after execution
2.3 Planning Algorithms
Classical planning can be viewed as a graph search problem:
where:
- \(g(n)\): Cost from the initial state to node \(n\)
- \(h(n)\): Heuristic estimate from \(n\) to the goal
Common Heuristics:
| Heuristic | Definition | Characteristics |
|---|---|---|
| \(h_{\text{add}}\) | Assumes subgoals are independent, sums individual costs | Not admissible, but informative |
| \(h_{\text{max}}\) | Takes maximum of individual subgoal costs | Admissible, but not tight |
| \(h_{\text{FF}}\) | Cost estimate from relaxed graph (ignoring delete effects) | Commonly used, good performance |
| \(h_{\text{LM-cut}}\) | Optimal delete-relaxation based on landmarks | Admissible, very tight |
Major Planners:
- Fast Downward: Supports multiple heuristics and search strategies
- LAMA: Multi-heuristic lazy search
- Pyperplan: Python implementation, suitable for teaching
3. PDDLStream: Sampling Integration
3.1 Limitations of PDDL
Standard PDDL assumes all parameters come from finite discrete sets, but robot manipulation involves continuous parameters:
- Grasp pose \(T_{\text{grasp}} \in SE(3)\)
- Placement position \((x, y, z) \in \mathbb{R}^3\)
- Motion trajectory \(\tau: [0,1] \rightarrow \mathcal{C}\)
3.2 The PDDLStream Framework
Garrett et al. (2020) proposed PDDLStream, integrating continuous parameter sampling into symbolic planning through streams:
Stream Definition:
(:stream sample-grasp
:inputs (?obj)
:domain (Graspable ?obj)
:outputs (?grasp)
:certified (GraspPose ?obj ?grasp)
)
(:stream plan-motion
:inputs (?q1 ?q2)
:domain (and (Config ?q1) (Config ?q2))
:outputs (?traj)
:certified (and (Motion ?q1 ?traj ?q2) (Collision-free ?traj))
)
Workflow:
- The symbolic planner generates an abstract plan skeleton
- For continuous parameters in the skeleton, corresponding streams are called to sample
- Check whether the sampled results satisfy geometric constraints
- If not, backtrack and try a different skeleton or sampling
3.3 Sampling Strategies
Adaptive Sampling: Prioritize sampling of more likely successful schemes
Learning-Based Sampling: Use neural networks to learn sampling distributions
4. Hierarchical Task Networks (HTN)
4.1 Basic Concepts
HTN (Hierarchical Task Network) progressively refines complex tasks through hierarchical decomposition:
Core Idea:
4.2 Formalization
An HTN domain is defined as \(\langle \mathcal{O}, \mathcal{M} \rangle\):
- \(\mathcal{O}\): Primitive operators -- directly executable actions
- \(\mathcal{M}\): Methods -- define how to decompose compound tasks
Method Definition:
Example:
Method: make-coffee
Task: (prepare-drink coffee)
Precondition: (and (have coffee-beans) (have water) (have cup))
Subtasks:
1. (grind coffee-beans)
2. (boil water)
3. (brew ground-coffee hot-water)
4. (pour coffee cup)
4.3 HTN vs. Classical Planning
| Dimension | Classical Planning (STRIPS/PDDL) | HTN |
|---|---|---|
| Goal Expression | Goal state predicates | Tasks to be completed |
| Search Space | State space | Decomposition space |
| Domain Knowledge | Action schemas only | Decomposition methods (more prior knowledge) |
| Plan Quality | Can guarantee optimality (some heuristics) | Depends on method quality |
| Computational Efficiency | Can be slow | Accelerated by domain knowledge |
5. LLMs as Modern TAMP
5.1 Zero-Shot Task Decomposition with LLMs
Large language models naturally possess task decomposition capabilities:
Traditional TAMP: Requires manually defined domains and methods
LLM-TAMP: Directly generates task plans from natural language instructions
User: "Help me make a cup of coffee"
LLM:
1. Walk to the kitchen counter
2. Pick up the coffee cup
3. Place it under the coffee machine
4. Press the coffee machine button
5. Wait for the coffee to brew
6. Pick up the coffee cup
7. Bring it to the user
5.2 Representative Works
SayCan (Ahn et al., 2022)
LLM provides a semantic score, robot skills provide a feasibility score:
Code as Policies (Liang et al., 2023)
LLM directly generates executable Python code:
# LLM-generated policy code
def pour_water_into_cup():
bottle = detect("water bottle")
cup = detect("cup")
grasp(bottle)
move_to(above(cup))
tilt(angle=90, duration=3)
place(bottle, on=table)
Inner Monologue (Huang et al., 2022)
Closed-loop LLM planning with environmental feedback:
LLM: Pick up the red cup
Environment Feedback: Grasp failed, cup is behind an obstacle
LLM: First move the obstacle aside
Environment Feedback: Successfully moved
LLM: Now pick up the red cup
5.3 LLM-TAMP vs. Traditional TAMP
| Dimension | Traditional TAMP | LLM-TAMP |
|---|---|---|
| Domain Definition | Requires manual PDDL | Natural language suffices |
| Generalization | Limited to predefined domains | Open-vocabulary, zero-shot |
| Reliability | Formal guarantees | May hallucinate |
| Optimality | Can be guaranteed | Not guaranteed |
| Physical Feasibility | Verified through geometric checks | Requires additional verification |
| Commonsense Reasoning | Limited | Powerful |
5.4 Integration Approaches
The latest research attempts to combine the advantages of both:
- LLM + PDDL: LLM generates PDDL domain definitions; traditional planners solve
- LLM + Verifier: LLM generates plans; formal tools verify feasibility
- LLM + Constraint Solver: LLM provides high-level structure; constraint solvers handle details
6. Motion Planning Fundamentals
6.1 Configuration Space
Motion planning is performed in the configuration space \(\mathcal{C}\):
- \(\mathcal{C}_{\text{free}}\): Free space (collision-free)
- \(\mathcal{C}_{\text{obs}}\): Obstacle space
- Goal: Find a path from \(q_{\text{start}}\) to \(q_{\text{goal}}\) in \(\mathcal{C}_{\text{free}}\)
6.2 Sampling-Based Planning
RRT (Rapidly-exploring Random Tree):
Algorithm RRT:
1. Initialize tree T = {q_start}
2. Repeat:
a. Randomly sample q_rand in C
b. Find nearest node q_near in T
c. Extend toward q_rand by step Δq to get q_new
d. If edge(q_near, q_new) is in C_free:
Add q_new to T
3. Until q_goal is connected
RRT* (Asymptotically Optimal):
Adds a rewire step to RRT, guaranteeing that as the number of samples increases, the path cost converges to the optimum:
6.3 Trajectory Optimization
Formulating motion planning as an optimization problem:
Representative Methods:
- CHOMP: Covariant Hamiltonian optimization
- TrajOpt: Sequential convex optimization
- STOMP: Stochastic trajectory optimization
7. Open Challenges in TAMP
7.1 Scalability
- Combinatorial explosion as the number of objects increases
- Planning efficiency for long-horizon tasks
- Multi-robot coordinated planning
7.2 Handling Uncertainty
- Perception uncertainty: Object pose estimation errors
- Execution uncertainty: Imprecise action effects
- Environmental uncertainty: Dynamically changing environments
7.3 Integrating Learning and Planning
- Learned heuristics to accelerate search
- Learning decomposition methods from experience
- Neurosymbolic methods: Continuous representations + discrete reasoning
References
- Garrett, C. R. et al. (2020). "PDDLStream: Integrating Symbolic Planners and Blackbox Samplers"
- Ahn, M. et al. (2022). "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances"
- Liang, J. et al. (2023). "Code as Policies: Language Model Programs for Embodied Control"
- Huang, W. et al. (2022). "Inner Monologue: Embodied Reasoning through Planning with Language Models"
- LaValle, S. M. (2006). Planning Algorithms
- Ghallab, M. et al. (2004). Automated Planning: Theory and Practice
Related Notes: