ReAct and Tool Reasoning
Overview
ReAct (Reasoning + Acting), proposed by Yao et al. (2022), is one of the most important architectural paradigms for LLM agents. It unifies reasoning (Thought) and action (Action) in an interleaved sequence, enabling LLMs to complete complex tasks through interaction with external environments. This article provides an in-depth analysis of ReAct's principles, implementation, and extensions.
1. Motivation: Unifying Reasoning and Action
1.1 Limitations of Pure Reasoning
Chain-of-Thought enables LLMs to show their reasoning process, but has fundamental limitations:
- Stale knowledge: Knowledge in model parameters has a cutoff date
- Factual hallucination: Reasoning may be based on incorrect premises
- Unverifiable: Can only reason internally, cannot verify information
- Cannot act: Can only generate text, cannot execute operations
1.2 Limitations of Pure Action
Directly having LLMs call tools (like Toolformer) also has problems:
- Lack of planning: Does not know why a particular tool should be called
- Cannot reason: Cannot analyze information returned by tools
- Blind action: May repeat ineffective operations
1.3 ReAct's Core Insight
When humans solve problems, thinking and acting alternate: thinking guides action, and the results of action update thinking.
2. ReAct Core Mechanism
2.1 Thought-Action-Observation Loop
graph TD
Q[Question/Task] --> T1[Thought 1<br/>Analyze problem, formulate strategy]
T1 --> A1[Action 1<br/>Call tool/Execute operation]
A1 --> O1[Observation 1<br/>Tool returns result]
O1 --> T2[Thought 2<br/>Analyze result, decide next step]
T2 --> A2[Action 2<br/>Call another tool]
A2 --> O2[Observation 2<br/>New result]
O2 --> T3[Thought 3<br/>Synthesize information, draw conclusion]
T3 --> ANS[Final Answer]
2.2 Formal Definition
Given task \(q\), ReAct generates an interleaved sequence:
where:
- \(t_i \sim P_{\text{LLM}}(\cdot \mid q, t_1, a_1, o_1, \ldots, o_{i-1})\) -- LLM-generated thought
- \(a_i \sim P_{\text{LLM}}(\cdot \mid q, t_1, a_1, o_1, \ldots, t_i)\) -- LLM-selected action
- \(o_i = \text{Env}(a_i)\) -- Observation after environment executes action
Key: Thought \(t_i\) is generated internally by the LLM (not passed to the environment), used for:
- Tracking current progress and goals
- Reasoning about what action to take
- Analyzing the meaning of observation results
- Deciding whether sufficient information exists to answer
2.3 Comparison with CoT and Act-Only
| Method | Sequence Format | External Information | Reasoning Visible |
|---|---|---|---|
| Standard | \(q \rightarrow a\) | None | No |
| CoT | \(q \rightarrow t_1, t_2, \ldots \rightarrow a\) | None | Yes |
| Act-Only | \(q \rightarrow a_1, o_1, a_2, o_2, \ldots\) | Yes | No |
| ReAct | \(q \rightarrow t_1, a_1, o_1, t_2, a_2, o_2, \ldots\) | Yes | Yes |
3. Example Analysis
3.1 Knowledge QA Example
Question: What program runs on the device that Apple Remote can control?
Thought 1: I need to first look up what devices Apple Remote can control.
Action 1: Search[Apple Remote]
Observation 1: Apple Remote is a remote control by Apple that can control
Front Row media software on Mac, Apple TV, and iPhone.
Thought 2: Apple Remote can control devices running Front Row.
But what is Front Row? Let me verify.
Action 2: Search[Front Row software]
Observation 2: Front Row is a discontinued media center software by Apple
for Mac, similar to Windows Media Center.
Thought 3: Based on the search results, the program running on
the devices Apple Remote controls is Front Row.
Action 3: Finish[Front Row]
3.2 Comparative Analysis
Possible error with pure CoT:
Thought: Apple Remote is Apple's remote control...it probably controls iTunes...
Answer: iTunes ← Hallucination!
Possible problem with pure Act-Only:
Action 1: Search[Apple Remote program] ← Imprecise search term
Observation 1: (irrelevant results)
Action 2: Search[Apple Remote software] ← Blind retry
ReAct's advantage: Thought guides more precise searches; Observation corrects possible hallucinations.
4. MRKL System
MRKL (Modular Reasoning, Knowledge and Language, pronounced "miracle"), proposed by Karpas et al. (2022), is another tool reasoning architecture.
4.1 Architecture
graph TD
INPUT[User Input] --> ROUTER[Router/LLM]
ROUTER --> M1[Math Module<br/>Calculator/Wolfram]
ROUTER --> M2[Search Module<br/>Google/Wikipedia]
ROUTER --> M3[Database Module<br/>SQL Query]
ROUTER --> M4[Code Module<br/>Python Execution]
ROUTER --> M5[Knowledge Module<br/>LLM's Own Knowledge]
M1 --> AGG[Aggregator]
M2 --> AGG
M3 --> AGG
M4 --> AGG
M5 --> AGG
AGG --> OUTPUT[Final Answer]
4.2 Differences from ReAct
| Dimension | ReAct | MRKL |
|---|---|---|
| Control flow | Sequential iteration | Routing dispatch |
| Module selection | Dynamic at each step | One-time routing |
| Reasoning visibility | Thought explicitly visible | Routing decision implicit |
| Multi-step reasoning | Natively supported | Requires additional mechanisms |
5. Mathematical Analysis of Action Selection
5.1 Action Space
Define action space \(\mathcal{A} = \{a_1, a_2, \ldots, a_K\}\), where each action \(a_i\) may be:
- A tool call (with parameters)
- A termination action (output answer)
- An internal action (reasoning, waiting)
5.2 Selection Probability
The LLM's probability of selecting action \(a_i\):
where \(c\) is the current context (containing all previous thoughts, actions, and observations), and \(T\) is the temperature.
5.3 Grounding Effect
ReAct's Observations provide a grounding effect. Define reasoning accuracy:
When external observations provide relevant and correct information, fact-based reasoning is more reliable than pure internal reasoning.
6. Implementation Patterns
6.1 Prompt Template
REACT_PROMPT = """
Answer the following question by reasoning step by step
and using tools when needed.
Available tools:
- Search[query]: Search Wikipedia for information
- Lookup[keyword]: Look up a keyword in the current page
- Finish[answer]: Return the final answer
Question: {question}
{examples}
Question: {input_question}
"""
6.2 Parsing and Execution Loop
def react_loop(question, tools, max_steps=10):
context = f"Question: {question}\n"
for step in range(max_steps):
# LLM generates Thought + Action
response = llm.generate(REACT_PROMPT + context)
thought, action = parse_response(response)
context += f"Thought {step+1}: {thought}\n"
context += f"Action {step+1}: {action}\n"
# Check if termination action
if action.startswith("Finish"):
return extract_answer(action)
# Execute action to get observation
observation = execute_tool(action, tools)
context += f"Observation {step+1}: {observation}\n"
return "Maximum steps reached"
6.3 Tool Definition Best Practices
tools = [
{
"name": "search",
"description": "Search for information on the web. "
"Use when you need factual information.",
"parameters": {
"query": {
"type": "string",
"description": "The search query"
}
}
},
{
"name": "calculator",
"description": "Perform mathematical calculations. "
"Use when you need precise arithmetic.",
"parameters": {
"expression": {
"type": "string",
"description": "Mathematical expression to evaluate"
}
}
}
]
7. ReAct Variants and Extensions
7.1 ReAct + Self-Consistency
Run ReAct multiple times and majority-vote on final answers:
7.2 ReAct + Reflexion
Reflect after failure, storing experience in memory:
[First attempt fails]
Reflection: Last time I searched with incorrect keywords, leading to
irrelevant information. Next time I should analyze the key
entities in the problem first, then construct search terms.
[Second attempt, with reflection memory]
7.3 ReWOO (Reasoning WithOut Observation)
Xu et al. (2023) proposed generating a complete reasoning plan first (without observations), then executing in batch:
Advantage: Reduces LLM call count; suitable for scenarios with low tool call latency.
Cross-Reference
For a detailed discussion of tool use, see Tool Use Survey.
8. Limitations and Challenges
- Context window: Long interaction histories consume many tokens, potentially exceeding window limits
- Tool selection errors: LLM may select the wrong tool or pass incorrect parameters
- Infinite loops: May get stuck in repetitive think-act cycles
- Observation noise: Information returned by tools may be inaccurate or irrelevant
- Planning depth: ReAct is essentially greedy, lacking global planning
- Cost: Each step requires LLM reasoning, consuming many tokens
References
- Yao, S. et al. (2022). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023.
- Karpas, E. et al. (2022). MRKL Systems: A Modular, Neuro-Symbolic Architecture that Combines Large Language Models, External Knowledge Sources and Discrete Reasoning. arXiv:2205.00445.
- Schick, T. et al. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. NeurIPS 2023.
- Xu, B. et al. (2023). ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models. arXiv:2305.18323.
- Nakano, R. et al. (2021). WebGPT: Browser-assisted Question-answering with Human Feedback. arXiv:2112.09332.