Skip to content

Task and Motion Planning Theory

Overview

Task and Motion Planning (TAMP) is one of the core problems in embodied intelligence: how to decompose high-level task goals into executable physical action sequences. This article systematically introduces the TAMP theoretical framework, including PDDL formalization, HTN hierarchical decomposition, PDDLStream sampling integration, and LLMs as a new paradigm for modern TAMP engines.


1. TAMP Problem Definition

1.1 Why TAMP Is Needed

Traditionally, task planning and motion planning were studied separately:

  • Task Planning: Reasoning about action sequences at the symbolic level (e.g., "first pick up the cup, then pour water")
  • Motion Planning: Generating collision-free trajectories at the geometric/physical level

However, in actual robot manipulation, the two are tightly coupled:

  • Some symbolically feasible plans are geometrically infeasible (e.g., an object is occluded and cannot be directly grasped)
  • Motion constraints influence task-level decisions (e.g., move an obstacle first)
  • Both discrete decisions (what to do) and continuous parameters (how to do it) must be considered simultaneously

1.2 TAMP Architecture

flowchart TB
    subgraph Task Layer
        A[Task Goal] --> B[Symbolic Planner]
        B --> C[Action Skeleton<br/>pick→place→pour]
    end
    subgraph Geometric Layer
        C --> D[Parameter Sampling]
        D --> E[Motion Planning]
        E --> F[Feasibility Check]
    end
    F -->|Infeasible| B
    F -->|Feasible| G[Execute Trajectory]

    style A fill:#e3f2fd
    style B fill:#f3e5f5
    style C fill:#fff8e1
    style D fill:#e8f5e9
    style E fill:#e8f5e9
    style F fill:#fce4ec

1.3 Formal Definition

A TAMP problem can be defined as a tuple \(\langle \mathcal{S}, \mathcal{A}, s_0, G, \mathcal{C} \rangle\):

  • \(\mathcal{S} = \mathcal{S}_{\text{discrete}} \times \mathcal{S}_{\text{continuous}}\): Hybrid state space
  • \(\mathcal{A}\): Action set, where each action has discrete and continuous parameters
  • \(s_0\): Initial state
  • \(G\): Goal condition
  • \(\mathcal{C}\): Geometric/physical constraints

2. PDDL: Planning Domain Definition Language

2.1 PDDL Basics

PDDL (Planning Domain Definition Language) is the standard formalization language for task planning, consisting of two parts:

Domain: Defines types, predicates, and action templates

(define (domain pick-and-place)
  (:predicates
    (on ?obj ?surface)
    (holding ?obj)
    (clear ?obj)
    (arm-empty)
  )

  (:action pick
    :parameters (?obj ?surface)
    :precondition (and 
      (on ?obj ?surface) 
      (clear ?obj) 
      (arm-empty))
    :effect (and 
      (holding ?obj) 
      (not (on ?obj ?surface)) 
      (not (arm-empty))
      (clear ?surface))
  )

  (:action place
    :parameters (?obj ?surface)
    :precondition (and 
      (holding ?obj) 
      (clear ?surface))
    :effect (and 
      (on ?obj ?surface) 
      (not (holding ?obj)) 
      (arm-empty)
      (not (clear ?surface)))
  )
)

Problem: Defines specific objects, initial state, and goals

(define (problem example)
  (:domain pick-and-place)
  (:objects cup plate table counter)
  (:init 
    (on cup table) 
    (on plate counter) 
    (clear cup) 
    (clear plate) 
    (arm-empty))
  (:goal (and 
    (on cup counter) 
    (on plate table)))
)

2.2 Action Schema

Each action schema contains:

\[\text{Action} = \langle \text{name}, \text{params}, \text{pre}, \text{eff}^+, \text{eff}^- \rangle\]
  • Preconditions: Conditions that must be satisfied before execution
  • Positive Effects: Predicates that become true after execution
  • Negative Effects: Predicates that become false after execution

2.3 Planning Algorithms

Classical planning can be viewed as a graph search problem:

\[f(n) = g(n) + h(n)\]

where:

  • \(g(n)\): Cost from the initial state to node \(n\)
  • \(h(n)\): Heuristic estimate from \(n\) to the goal

Common Heuristics:

Heuristic Definition Characteristics
\(h_{\text{add}}\) Assumes subgoals are independent, sums individual costs Not admissible, but informative
\(h_{\text{max}}\) Takes maximum of individual subgoal costs Admissible, but not tight
\(h_{\text{FF}}\) Cost estimate from relaxed graph (ignoring delete effects) Commonly used, good performance
\(h_{\text{LM-cut}}\) Optimal delete-relaxation based on landmarks Admissible, very tight

Major Planners:

  • Fast Downward: Supports multiple heuristics and search strategies
  • LAMA: Multi-heuristic lazy search
  • Pyperplan: Python implementation, suitable for teaching

3. PDDLStream: Sampling Integration

3.1 Limitations of PDDL

Standard PDDL assumes all parameters come from finite discrete sets, but robot manipulation involves continuous parameters:

  • Grasp pose \(T_{\text{grasp}} \in SE(3)\)
  • Placement position \((x, y, z) \in \mathbb{R}^3\)
  • Motion trajectory \(\tau: [0,1] \rightarrow \mathcal{C}\)

3.2 The PDDLStream Framework

Garrett et al. (2020) proposed PDDLStream, integrating continuous parameter sampling into symbolic planning through streams:

Stream Definition:

(:stream sample-grasp
  :inputs (?obj)
  :domain (Graspable ?obj)
  :outputs (?grasp)
  :certified (GraspPose ?obj ?grasp)
)

(:stream plan-motion
  :inputs (?q1 ?q2)
  :domain (and (Config ?q1) (Config ?q2))
  :outputs (?traj)
  :certified (and (Motion ?q1 ?traj ?q2) (Collision-free ?traj))
)

Workflow:

  1. The symbolic planner generates an abstract plan skeleton
  2. For continuous parameters in the skeleton, corresponding streams are called to sample
  3. Check whether the sampled results satisfy geometric constraints
  4. If not, backtrack and try a different skeleton or sampling

3.3 Sampling Strategies

Adaptive Sampling: Prioritize sampling of more likely successful schemes

\[p(\text{sample}) \propto \exp\left(-\frac{\text{failure\_count}}{\tau}\right)\]

Learning-Based Sampling: Use neural networks to learn sampling distributions

\[T_{\text{grasp}} \sim q_\phi(T | o_{\text{obj}}, s_{\text{scene}})\]

4. Hierarchical Task Networks (HTN)

4.1 Basic Concepts

HTN (Hierarchical Task Network) progressively refines complex tasks through hierarchical decomposition:

Core Idea:

\[\text{Abstract Task} \xrightarrow{\text{Decomposition Method}} \text{Subtask Sequence} \xrightarrow{\text{Recursive Decomposition}} \text{Primitive Actions}\]

4.2 Formalization

An HTN domain is defined as \(\langle \mathcal{O}, \mathcal{M} \rangle\):

  • \(\mathcal{O}\): Primitive operators -- directly executable actions
  • \(\mathcal{M}\): Methods -- define how to decompose compound tasks

Method Definition:

\[m = \langle \text{task}, \text{precondition}, \text{subtasks} \rangle\]

Example:

Method: make-coffee
  Task: (prepare-drink coffee)
  Precondition: (and (have coffee-beans) (have water) (have cup))
  Subtasks: 
    1. (grind coffee-beans)
    2. (boil water)
    3. (brew ground-coffee hot-water)
    4. (pour coffee cup)

4.3 HTN vs. Classical Planning

Dimension Classical Planning (STRIPS/PDDL) HTN
Goal Expression Goal state predicates Tasks to be completed
Search Space State space Decomposition space
Domain Knowledge Action schemas only Decomposition methods (more prior knowledge)
Plan Quality Can guarantee optimality (some heuristics) Depends on method quality
Computational Efficiency Can be slow Accelerated by domain knowledge

5. LLMs as Modern TAMP

5.1 Zero-Shot Task Decomposition with LLMs

Large language models naturally possess task decomposition capabilities:

Traditional TAMP: Requires manually defined domains and methods

LLM-TAMP: Directly generates task plans from natural language instructions

User: "Help me make a cup of coffee"
LLM: 
1. Walk to the kitchen counter
2. Pick up the coffee cup
3. Place it under the coffee machine
4. Press the coffee machine button
5. Wait for the coffee to brew
6. Pick up the coffee cup
7. Bring it to the user

5.2 Representative Works

SayCan (Ahn et al., 2022)

LLM provides a semantic score, robot skills provide a feasibility score:

\[a^* = \arg\max_a \underbrace{p_{\text{LLM}}(\text{useful} | a, l)}_{\text{Semantic Score}} \cdot \underbrace{p_{\text{affordance}}(\text{possible} | a, s)}_{\text{Feasibility Score}}\]

Code as Policies (Liang et al., 2023)

LLM directly generates executable Python code:

# LLM-generated policy code
def pour_water_into_cup():
    bottle = detect("water bottle")
    cup = detect("cup")
    grasp(bottle)
    move_to(above(cup))
    tilt(angle=90, duration=3)
    place(bottle, on=table)

Inner Monologue (Huang et al., 2022)

Closed-loop LLM planning with environmental feedback:

LLM: Pick up the red cup
Environment Feedback: Grasp failed, cup is behind an obstacle
LLM: First move the obstacle aside
Environment Feedback: Successfully moved
LLM: Now pick up the red cup

5.3 LLM-TAMP vs. Traditional TAMP

Dimension Traditional TAMP LLM-TAMP
Domain Definition Requires manual PDDL Natural language suffices
Generalization Limited to predefined domains Open-vocabulary, zero-shot
Reliability Formal guarantees May hallucinate
Optimality Can be guaranteed Not guaranteed
Physical Feasibility Verified through geometric checks Requires additional verification
Commonsense Reasoning Limited Powerful

5.4 Integration Approaches

The latest research attempts to combine the advantages of both:

  • LLM + PDDL: LLM generates PDDL domain definitions; traditional planners solve
  • LLM + Verifier: LLM generates plans; formal tools verify feasibility
  • LLM + Constraint Solver: LLM provides high-level structure; constraint solvers handle details

6. Motion Planning Fundamentals

6.1 Configuration Space

Motion planning is performed in the configuration space \(\mathcal{C}\):

  • \(\mathcal{C}_{\text{free}}\): Free space (collision-free)
  • \(\mathcal{C}_{\text{obs}}\): Obstacle space
  • Goal: Find a path from \(q_{\text{start}}\) to \(q_{\text{goal}}\) in \(\mathcal{C}_{\text{free}}\)

6.2 Sampling-Based Planning

RRT (Rapidly-exploring Random Tree):

Algorithm RRT:
1. Initialize tree T = {q_start}
2. Repeat:
   a. Randomly sample q_rand in C
   b. Find nearest node q_near in T
   c. Extend toward q_rand by step Δq to get q_new
   d. If edge(q_near, q_new) is in C_free:
      Add q_new to T
3. Until q_goal is connected

RRT* (Asymptotically Optimal):

Adds a rewire step to RRT, guaranteeing that as the number of samples increases, the path cost converges to the optimum:

\[\lim_{n \to \infty} \text{cost}(\text{RRT}^*) = \text{cost}^*\]

6.3 Trajectory Optimization

Formulating motion planning as an optimization problem:

\[\min_\tau \int_0^T \left[ \|\ddot{q}(t)\|^2 + \lambda \cdot c_{\text{obs}}(q(t)) \right] dt\]
\[\text{s.t.} \quad q(0) = q_{\text{start}}, \quad q(T) = q_{\text{goal}}, \quad q(t) \in \mathcal{C}_{\text{free}}\]

Representative Methods:

  • CHOMP: Covariant Hamiltonian optimization
  • TrajOpt: Sequential convex optimization
  • STOMP: Stochastic trajectory optimization

7. Open Challenges in TAMP

7.1 Scalability

  • Combinatorial explosion as the number of objects increases
  • Planning efficiency for long-horizon tasks
  • Multi-robot coordinated planning

7.2 Handling Uncertainty

  • Perception uncertainty: Object pose estimation errors
  • Execution uncertainty: Imprecise action effects
  • Environmental uncertainty: Dynamically changing environments

7.3 Integrating Learning and Planning

  • Learned heuristics to accelerate search
  • Learning decomposition methods from experience
  • Neurosymbolic methods: Continuous representations + discrete reasoning

References

  • Garrett, C. R. et al. (2020). "PDDLStream: Integrating Symbolic Planners and Blackbox Samplers"
  • Ahn, M. et al. (2022). "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances"
  • Liang, J. et al. (2023). "Code as Policies: Language Model Programs for Embodied Control"
  • Huang, W. et al. (2022). "Inner Monologue: Embodied Reasoning through Planning with Language Models"
  • LaValle, S. M. (2006). Planning Algorithms
  • Ghallab, M. et al. (2004). Automated Planning: Theory and Practice

Related Notes:


评论 #