Skip to content

Reasoning Patterns

Large language models (LLMs) have inherently limited reasoning capabilities — they tend to produce answers in a single step rather than through deliberate, multi-step reasoning. To address this limitation, researchers have proposed a variety of reasoning patterns that enable LLMs to solve complex problems more systematically and reliably. These reasoning patterns form the core foundation of modern AI agents.


CoT (Chain of Thought)

Fundamentals

CoT (Chain of Thought) serves as the foundation for all modern agent frameworks. The core idea is remarkably intuitive: instead of having the model directly output a final answer, we guide it to demonstrate its reasoning process step by step — much like a human writing intermediate steps on scratch paper when solving a problem.

The theoretical basis for CoT is that when a model is asked to generate intermediate reasoning steps, it can decompose complex problems into simpler sub-problems, thereby significantly improving performance on tasks such as arithmetic reasoning, commonsense reasoning, and symbolic reasoning.

Few-shot CoT

Few-shot CoT was proposed by Wei et al. (2022) in the paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. The approach provides several exemplars with detailed reasoning steps in the prompt, guiding the model to imitate this step-by-step reasoning pattern:

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
   Each can has 3 tennis balls. How many tennis balls does he have now?
A: Roger started with 5 tennis balls. 2 cans of 3 tennis balls each
   is 2×3=6 tennis balls. 5+6=11. The answer is 11.

Q: The cafeteria had 23 apples. If they used 20 and then bought 6 more,
   how many apples do they have?
A: The cafeteria started with 23 apples. They used 20, so they had
   23-20=3. They bought 6 more, so 3+6=9. The answer is 9.

Zero-shot CoT

Kojima et al. (2022) discovered that simply appending the phrase "Let's think step by step" to the end of a prompt can elicit step-by-step reasoning without providing any exemplars. This finding is highly significant — it suggests that large language models already possess latent chain reasoning capabilities, which can be unlocked with a simple trigger instruction.

Q: A grocer had some apples. After selling 40%, he had 420 left.
   How many did he start with?
A: Let's think step by step.

Self-Consistency (SC)

Self-Consistency was proposed by Wang et al. (2023) as an important enhancement to CoT. The core idea is: sample multiple different reasoning paths for the same problem, then select the most consistent answer through majority voting.

The workflow is as follows:

  1. For the same problem, perform multiple CoT inferences using a higher temperature (e.g., sampling 10–40 different reasoning chains)
  2. Each reasoning chain independently arrives at an answer
  3. Conduct majority voting across all answers and select the most frequent one as the final output

The advantage of SC is that it does not rely on the correctness of any single reasoning path. Instead, it improves reliability through the "democratic vote" of multiple paths. Even if some reasoning chains err, as long as the majority converge on the correct answer, the final result will be correct.


ReAct

Fundamentals

ReAct (Reason + Act) was proposed by Yao et al. (2023) in the paper ReAct: Synergizing Reasoning and Acting in Language Models. The core idea is: instead of expecting the LLM to produce a final answer in a single step, the agent should behave like a human — interleaving thinking and acting.

The key innovation of ReAct lies in unifying reasoning and acting within a single interleaved loop:

Thought: I need to look up some information to answer this question
Action: Search["keyword"]
Observation: [results returned by search]
Thought: Based on the search results, I now know... but I still need to verify...
Action: Lookup["supplementary information"]
Observation: [results returned by lookup]
Thought: Now I have enough information to answer the question
Action: Finish["final answer"]

Comparison with Pure CoT and Pure Act

Method Reasoning Acting Limitations
Pure CoT Yes No Cannot access external information; prone to hallucination
Pure Act No Yes Lacks planning; actions are unguided
ReAct Yes Yes Reasoning and acting support each other; more reliable

The problem with pure CoT is that the model can only reason based on knowledge from its training data, unable to interact with the external world — when encountering questions requiring real-time information or knowledge the model is uncertain about, it is prone to factual errors (hallucinations). The problem with pure Act is that the model takes actions without explicit reasoning, lacking deep consideration of goals and strategies. ReAct combines both, letting reasoning guide the direction of actions and letting the results of actions (Observations) feed back into reasoning, forming a positive feedback loop.

ReAct in Agent Applications

The ReAct pattern is the default reasoning mode in current mainstream agent frameworks (such as LangChain and LlamaIndex). In practice, the ReAct loop is typically combined with tool use: the agent's "Action" is a call to some tool (search engine, code interpreter, API, etc.), and the "Observation" is the result returned by that tool.


Reflexion

Fundamentals

Reflexion was proposed by Shinn et al. (2023) in the paper Reflexion: Language Agents with Verbal Reinforcement Learning. It introduces a self-reflection mechanism that enables agents to learn from failure without updating model weights.

Traditional reinforcement learning uses scalar reward signals (e.g., +1/-1) to update policies, but such signals are too sparse to guide specific behavioral improvements. The core insight of Reflexion is: replace scalar rewards with natural language reflections (linguistic feedback), allowing the agent to describe its mistakes and improvement directions in words.

Workflow

A complete Reflexion cycle consists of three core components:

  1. Actor: Executes the task and generates an action trajectory
  2. Evaluator: Assesses the quality of the Actor's output and determines whether the task succeeded
  3. Self-Reflection: When the task fails, generates a natural language reflection summary

The specific process is as follows:

Attempt 1:
  Actor executes task → obtains result → Evaluator judges: FAIL
  → Self-Reflection: "I made an error at step 3 — I mistook X for Y.
     Next time I should verify X's accuracy before proceeding."

Attempt 2:
  Actor executes task (referencing prior reflections) → obtains result → Evaluator: FAIL
  → Self-Reflection: "Although I avoided the previous mistake, I overlooked
     constraint Z in the final step. Next time I need to add Z verification
     in my final check."

Attempt 3:
  Actor executes task (referencing all historical reflections) → obtains result → Evaluator: SUCCESS!

Verbal Reinforcement Learning

Reflexion can be viewed as a form of "Verbal Reinforcement Learning" (Verbal RL):

  • Traditional RL: Updates network weights via numerical rewards → changes policy
  • Verbal RL: Updates memory via natural language reflections → changes behavior

The advantages of this approach are that reflection content is human-readable, making it easy to debug and understand the agent's learning process, and it requires no gradient updates, making it applicable to any black-box LLM.

Application Scenarios

Reflexion excels in the following scenarios:

  • Code generation: Generate code → run tests → reflect based on error messages → fix code
  • Decision-making tasks: Complete multi-step tasks in interactive environments (e.g., ALFWorld)
  • Reasoning tasks: Progressively approach the correct answer in complex reasoning problems that require multiple attempts

ToT (Tree of Thoughts)

Fundamentals

ToT (Tree of Thoughts) was proposed by Yao et al. (2024) in the paper Tree of Thoughts: Deliberate Problem Solving with Large Language Models. If CoT is a linear chain of reasoning, then ToT extends the reasoning process into a tree structure, allowing the model to explore and backtrack across multiple possible reasoning directions.

The core ideas of ToT are:

  1. Thought decomposition: Decompose the problem-solving process into intermediate steps (thoughts)
  2. Multi-path generation: Generate multiple candidate thoughts at each step
  3. Evaluation and selection: Evaluate each candidate thought (either by the LLM itself or by external evaluation)
  4. Search strategy: Use systematic search algorithms to find the optimal path through the thought tree

Search Strategies: BFS vs DFS

ToT supports two classic search strategies:

Breadth-First Search (BFS):

  • Retains the top \(k\) most promising nodes at each level
  • Suitable for problems requiring global comparison
  • Advantage: Less likely to miss the global optimum
  • Disadvantage: Higher memory and computational overhead

Depth-First Search (DFS):

  • Explores along the most promising branch as deep as possible
  • Backtracks to the previous level when hitting a dead end, then tries other branches
  • Suitable for problems with deep search spaces where correct paths are relatively concentrated
  • Advantage: High memory efficiency
  • Disadvantage: May get trapped in local optima

Key Differences from CoT

Feature CoT ToT
Reasoning structure Linear chain Tree structure
Candidate paths 1 (or multiple independent chains in SC) Multiple branches per step; backtracking supported
Search strategy None (sequential generation) BFS / DFS
Intermediate evaluation None Evaluates candidate thoughts at each step
Suitable scenarios General reasoning tasks Complex problems requiring exploration and backtracking
Computational cost Low Higher (multiple LLM calls)

Typical Applications

ToT shows clear advantages in the following types of problems:

  • Game of 24: Using 4 numbers and arithmetic operations to reach 24, requiring exploration of many combinations
  • Creative writing: Exploring different narrative directions and selecting the best path
  • Mathematical proofs: Trying different proof strategies

Plan-and-Execute

Fundamentals

Plan-and-Execute is a relatively robust approach to agent design, with the core principle of decoupling planning from execution:

  • Planning: First, have the LLM thoroughly analyze the user's requirements and formulate a detailed, step-by-step action plan
  • Executing: Then, the agent (or another LLM) strictly follows the plan, executing it step by step until completion

This design draws inspiration from classical AI planning and stands in contrast to ReAct's "think while doing" approach — Plan-and-Execute emphasizes "think it through first, then act."

LLMCompiler

LLMCompiler (Kim et al., 2024) is an important optimization of the Plan-and-Execute pattern. Its core idea is to generate a task DAG (Directed Acyclic Graph) during the planning phase, then identify tasks that can be executed in parallel, thereby significantly improving execution efficiency.

Workflow:

  1. Planner: Analyzes task dependencies and generates a task DAG
  2. Task Fetching Unit: Schedules parallelizable tasks for concurrent execution
  3. Joiner: Aggregates all task results and determines whether replanning is needed

Dynamic Replanning

In practice, initial plans often cannot be executed perfectly — the environment may change, certain steps may fail, or new information may be discovered during execution. Therefore, modern Plan-and-Execute frameworks typically support dynamic replanning:

  • After completing each step, evaluate the current state
  • If deviations from expectations are detected, trigger replanning
  • Generate an updated plan based on completed steps and new information
  • Continue executing the updated plan

This mechanism allows Plan-and-Execute to maintain its structural advantages while also possessing a degree of flexibility.


Comparative Summary

Method Core Idea Reasoning Structure External Interaction Self-Correction Computational Cost Typical Use Cases
CoT Step-by-step reasoning Linear chain None None (SC partially compensates) Low Math reasoning, commonsense QA
ReAct Alternating thought-action Linear loop Yes (tool calls) Via observations Medium Information retrieval, multi-step tasks
Reflexion Learning from failure Multi-round iteration Optional Yes (verbal reflection) Medium–High Code generation, decision tasks
ToT Tree-based exploration Tree structure None Via backtracking High Creative/math problems requiring search
Plan-and-Execute Plan first, execute later Phased Yes (tool calls) Dynamic replanning Medium Complex multi-step workflows

How to Choose

  • Simple reasoning tasks: CoT is sufficient
  • Tasks requiring external information: ReAct
  • Tasks allowing multiple attempts with clear feedback: Reflexion
  • Problems with large search spaces requiring exploration: ToT
  • Structured multi-step workflows: Plan-and-Execute

In practice, these reasoning patterns are rarely used in isolation within agent systems. Instead, they are combined based on task characteristics. For example, an agent might use Plan-and-Execute for overall planning, employ ReAct within each execution step to interact with tools, and trigger Reflexion for self-reflection and retry upon failure.


评论 #