Plan-Execute Frameworks

Overview

Plan-and-Execute is an agent architecture that separates planning (generating a step list) from execution (completing steps one by one). This separation allows different models to handle different task phases and supports dynamic re-planning. This article provides an in-depth analysis of the Plan-and-Execute pattern, the LLMCompiler parallel execution framework, and Hierarchical Task Network (HTN) planning.

1. Plan-and-Execute Pattern

1.1 Core Architecture

graph TD
    TASK[User Task] --> PLANNER[Planner LLM]
    PLANNER --> PLAN[Step List<br/>Step 1, 2, ..., N]
    PLAN --> EXEC1[Executor executes Step 1]
    EXEC1 --> RESULT1[Result 1]
    RESULT1 --> EXEC2[Executor executes Step 2]
    EXEC2 --> RESULT2[Result 2]
    RESULT2 --> DOTS[...]
    DOTS --> EXECN[Executor executes Step N]
    EXECN --> RESULTN[Result N]
    RESULTN --> REPLAN{Need re-planning?}
    REPLAN -->|Yes| PLANNER
    REPLAN -->|No| FINAL[Aggregate Final Result]

1.2 Comparison with ReAct

Dimension	ReAct	Plan-and-Execute
Planning approach	Decides next step at each step (greedy)	Generates complete plan first
Global view	None (only sees current step)	Yes (global plan)
LLM usage	Large model at every step	Large model for planning, small model for execution
Cost	Uniformly distributed	High planning cost, low execution cost
Adaptability	High (real-time adjustment)	Requires explicit re-planning
Interpretability	Medium	High (plan steps visible)

1.3 Two-Phase Design

Phase 1: Planning

PLANNER_PROMPT = """
Given the user task, generate a step-by-step execution plan.
Each step should be a clear, executable instruction.

Task: {task}

Please output a numbered list, one step per line:
1. ...
2. ...
"""

plan = planner_llm.generate(PLANNER_PROMPT.format(task=task))
steps = parse_plan(plan)

Phase 2: Execution

EXECUTOR_PROMPT = """
You need to execute the following step using available tools.

Current step: {step}
Previous execution results: {previous_results}
Available tools: {tools}

Please execute the current step.
"""

for step in steps:
    result = executor_llm.generate(
        EXECUTOR_PROMPT.format(
            step=step,
            previous_results=results,
            tools=tools
        )
    )
    results.append(result)

1.4 Dynamic Re-Planning

When unexpected situations arise during execution, trigger re-planning:

REPLAN_PROMPT = """
Original plan: {original_plan}
Completed steps and results: {completed_steps}
Current issue: {issue}

Please modify the remaining plan based on the current situation:
"""

def should_replan(step_result, expected):
    """Determine whether re-planning is needed"""
    # Execution failure
    if step_result.error:
        return True
    # Result deviates significantly from expectation
    if llm.evaluate(step_result, expected) < threshold:
        return True
    # New information discovered that changes the nature of the problem
    if llm.detect_new_info(step_result):
        return True
    return False

2. LLMCompiler: Parallel Execution

2.1 Core Idea

LLMCompiler, proposed by Kim et al. (2024), decomposes tasks into a directed acyclic graph (DAG), identifying steps that can be executed in parallel:

graph TD
    TASK[Task: Compare weather and population of Beijing and Shanghai] --> PLAN[LLM Planner<br/>Generate Task DAG]
    PLAN --> T1[Task 1: Query Beijing weather]
    PLAN --> T2[Task 2: Query Shanghai weather]
    PLAN --> T3[Task 3: Query Beijing population]
    PLAN --> T4[Task 4: Query Shanghai population]
    T1 --> JOIN[Joiner<br/>Aggregate all results]
    T2 --> JOIN
    T3 --> JOIN
    T4 --> JOIN
    JOIN --> ANS[Final comparative analysis]

2.2 Three Major Components

Planner

Generates a task list with dependency relationships:

Task: Compare weather and population of Beijing and Shanghai

1. search("Beijing current weather")        # No dependencies
2. search("Shanghai current weather")        # No dependencies
3. search("Beijing population 2024")         # No dependencies
4. search("Shanghai population 2024")        # No dependencies
5. join()                                    # Depends on 1,2,3,4

Key: The Planner not only generates the task list but also annotates dependency relationships, enabling tasks without dependencies to be executed in parallel.

Task Fetching Unit

Identifies parallel opportunities in the DAG:

\[ \text{ParallelSet}(t) = \{t_i \mid \text{deps}(t_i) \subseteq \text{completed}\} \]

That is, all tasks whose dependencies are complete can be launched in parallel.

Joiner

Aggregates results from parallel execution, deciding whether re-planning is needed:

def joiner(results, original_task):
    # Check if all necessary results are available
    if all_results_available(results):
        return llm.synthesize(results, original_task)
    else:
        # Partial failure, decide re-planning strategy
        return replan(results, original_task)

2.3 Performance Advantages

Metric	ReAct	Plan-Execute (Serial)	LLMCompiler (Parallel)
Latency	$N \times L$	$N \times l$	$D \times l$
LLM calls	$N$	$N + 1$	$1 + 1$
Tool calls	$N$ (serial)	$N$ (serial)	$N$ (parallel)

where $N$ is the number of steps, $L$ is large model latency, $l$ is small model/tool latency, and $D$ is the longest path depth in the DAG.

3. Hierarchical Task Network (HTN)

3.1 HTN Planning Overview

Hierarchical Task Networks are a classical AI hierarchical planning method that decomposes complex tasks top-down into subtasks:

\[ \text{Task} \rightarrow \text{Method} \rightarrow \text{Subtask}_1, \text{Subtask}_2, \ldots \]

graph TD
    T0[Prepare Dinner] --> M1[Method: Cook Chinese Food]
    M1 --> T1[Buy Groceries]
    M1 --> T2[Cook]
    M1 --> T3[Plate]
    T1 --> T11[Write Shopping List]
    T1 --> T12[Go to Supermarket]
    T1 --> T13[Select Ingredients]
    T2 --> T21[Wash and Cut Vegetables]
    T2 --> T22[Stir-Fry]
    T2 --> T23[Cook Rice]

3.2 Combining HTN with LLM

LLMs are naturally suited for hierarchical task decomposition:

HTN_DECOMPOSE_PROMPT = """
Decompose the following task into subtasks. Each subtask should be
either a directly executable atomic operation or a compound task
that can be further decomposed.

Task: {task}
Available atomic operations: {primitive_actions}

Please output the hierarchical decomposition:
Task: {task}
├── Subtask 1: ...
│   ├── Atomic operation: ...
│   └── Atomic operation: ...
├── Subtask 2: ...
└── Subtask 3: ...
"""

3.3 Advantages of HTN in Agents

Advantage	Description
Reusability	Decomposition methods can be reused across tasks
Abstraction levels	Reason and monitor at appropriate levels
Scalability	New decomposition methods can be added incrementally
Interpretability	Hierarchical structure clearly shows task logic
Failure recovery	Can retry at the subtask level

4. Advanced Planning Strategies

4.1 Adaptive Planning

Dynamically adjust plan granularity and content based on information gained during execution:

def adaptive_planning(task, initial_confidence):
    if initial_confidence > 0.9:
        # High confidence: generate detailed plan, execute at once
        plan = detailed_plan(task)
        return execute_all(plan)
    elif initial_confidence > 0.5:
        # Medium confidence: rough plan + incremental refinement
        rough_plan = rough_plan(task)
        for step in rough_plan:
            detailed_step = refine_step(step, context)
            result = execute(detailed_step)
            context.update(result)
    else:
        # Low confidence: exploratory execution
        return react_loop(task)

4.2 Speculative Planning

Similar to CPU speculative execution, predict possible branches and compute ahead:

Plan Step 3: Query user's account status
  Predicted Result A (80% likely): Account normal → Pre-prepare Step 4A
  Predicted Result B (20% likely): Account abnormal → Pre-prepare Step 4B

4.3 Constrained Planning

Add explicit constraints to planning:

\[ \text{Plan}^* = \arg\max_{\pi} R(\pi) \quad \text{s.t.} \quad C(\pi) \leq \text{budget} \]

Constraint types:

Constraint	Example
Time constraint	Total execution time < 60 seconds
Cost constraint	LLM API calls < $0.50
Safety constraint	No delete/modify operations
Quality constraint	Per-step verification pass rate > 95%

5. Framework Implementation

5.1 Plan-and-Execute in LangGraph

from langgraph.graph import StateGraph

# Define state
class PlanExecuteState(TypedDict):
    task: str
    plan: list[str]
    current_step: int
    results: list[str]
    final_answer: str

# Build graph
workflow = StateGraph(PlanExecuteState)

# Add nodes
workflow.add_node("planner", plan_step)
workflow.add_node("executor", execute_step)
workflow.add_node("replanner", replan_step)

# Add edges
workflow.set_entry_point("planner")
workflow.add_edge("planner", "executor")
workflow.add_conditional_edges(
    "executor",
    should_continue,
    {"replan": "replanner", "next": "executor", "end": END}
)
workflow.add_edge("replanner", "executor")

Cross-Reference

For a detailed introduction to the LangGraph framework, see LangChain and LangGraph.

5.2 Production Deployment Considerations

Consideration	Recommendation
Planning model	Use the strongest model (GPT-4, Claude Opus) to ensure plan quality
Execution model	Can use smaller models (GPT-3.5, Claude Haiku) to reduce cost
Re-planning threshold	Should not be too sensitive to avoid frequent re-planning
Maximum steps	Set an upper limit (e.g., 20 steps) to prevent infinite loops
Step granularity	Each step should be a clear, verifiable operation
Error handling	Distinguish retryable errors from fatal errors

6. Plan Quality Evaluation

6.1 Quality Dimensions of Plans

\[ Q(\text{plan}) = w_1 \cdot \text{Completeness} + w_2 \cdot \text{Executability} + w_3 \cdot \text{Efficiency} + w_4 \cdot \text{Robustness} \]

Dimension	Definition	Evaluation Method
Completeness	Does the plan cover all necessary steps?	Check if goal is reachable
Executability	Can each step actually be executed?	Check tool/API availability
Efficiency	Is the number of steps minimal?	Compare with optimal plan
Robustness	Tolerance for unexpected situations	Recovery ability after error injection

References

Wang, L. et al. (2023). Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models. ACL 2023.
Kim, S. et al. (2024). An LLM Compiler for Parallel Function Calling. arXiv:2312.04511.
Erol, K. et al. (1994). HTN Planning: Complexity and Expressivity. AAAI 1994.
Huang, W. et al. (2022). Inner Monologue: Embodied Reasoning through Planning with Language Models. CoRL 2022.
Sun, H. et al. (2023). AdaPlanner: Adaptive Planning from Feedback with Language Models. NeurIPS 2023.

Metric	ReAct	Plan-Execute (Serial)	LLMCompiler (Parallel)
Latency	\(N \times L\)	\(N \times l\)	\(D \times l\)
LLM calls	\(N\)	\(N + 1\)	\(1 + 1\)
Tool calls	\(N\) (serial)	\(N\) (serial)	\(N\) (parallel)