Overview of Tool Use
Introduction
Although language models excel at text generation and reasoning, they have inherent limitations in precise computation, real-time information retrieval, and code execution. Tool Use enables agents to invoke external functions and APIs, vastly expanding their capability boundaries.
The Symbol Grounding Problem
Harnad (1990) posed the question: how do symbols in a symbolic system acquire meaning? LLMs process tokens (symbols), but real-world operations require interaction with the physical world and digital systems. Tool use is a key pathway for LLMs to achieve "grounding."
graph LR
subgraph "LLM without Tools"
A[Language Ability] --> B[Text Generation]
A --> C[Reasoning]
A --> D[Knowledge Recall]
end
subgraph "Tool-Augmented Agent"
E[Language Ability] --> F[Text Generation]
E --> G[Reasoning]
E --> H[Knowledge Recall]
E --> I[Tool Invocation]
I --> J[Code Execution]
I --> K[API Calls]
I --> L[Database Queries]
I --> M[Browser Operations]
I --> N[File I/O]
end
Development History
Early Explorations
| Date | Work | Contribution |
|---|---|---|
| 2021 | WebGPT (OpenAI) | LLM learns to use search engines |
| 2022 | LaMDA (Google) | Tool invocation within dialogue |
| 2023.02 | Toolformer (Meta) | LLM self-learns when and how to use tools |
| 2023.03 | HuggingGPT | LLM as a controller dispatching AI models |
| 2023.03 | ChatGPT Plugins | Commercialized tool ecosystem |
| 2023.06 | Function Calling (OpenAI) | Standardized function calling protocol |
| 2023.10 | Gorilla | Fine-tuned model for large-scale API calls |
| 2024.03 | Claude Tool Use (Anthropic) | Native tool use support |
| 2024.11 | MCP (Anthropic) | Model Context Protocol standard |
Core Ideas of Toolformer
Schick et al. (2023) introduced Toolformer, which enables LLMs to autonomously learn tool usage:
- Insert API call annotations at positions in training text where tools might be needed
- Execute API calls to obtain results
- Retain only useful API calls (those that reduce perplexity)
- Fine-tune the model on this data
Why Tools Matter
Inherent Limitations of LLMs
| Limitation | Example | Tool Solution |
|---|---|---|
| Imprecise math | Errors in large number multiplication | Calculator / code executor |
| Knowledge cutoff | Unaware of latest news | Search engine / API |
| Cannot perform actions | Cannot send emails | Email API |
| No access to private data | Unaware of internal company info | Database queries |
| Limited multimodal ability | Image generation | DALL-E / Stable Diffusion API |
Classification of Tool Use
graph TB
TU[Tool Use] --> INFO[Information Retrieval]
TU --> ACT[Action Execution]
TU --> COMP[Computation Augmentation]
TU --> CREATE[Content Creation]
INFO --> S[Search Engines]
INFO --> DB[Database Queries]
INFO --> API1[Information APIs]
ACT --> EMAIL[Send Emails]
ACT --> FILE[File Operations]
ACT --> DEPLOY[Deploy Services]
COMP --> CODE[Code Execution]
COMP --> CALC[Calculator]
COMP --> DATA[Data Analysis]
CREATE --> IMG[Image Generation]
CREATE --> DOC[Document Generation]
CREATE --> CHART[Chart Drawing]
Basic Workflow of Tool Use
1. Understand Task → 2. Plan Tool Usage → 3. Select Tool → 4. Construct Parameters → 5. Execute Call → 6. Parse Result → 7. Integrate into Response
# Basic tool use loop
def agent_loop(query, tools, llm):
messages = [{"role": "user", "content": query}]
while True:
response = llm.chat(messages, tools=tools)
if response.finish_reason == "tool_calls":
# LLM decides to call tools
for tool_call in response.tool_calls:
result = execute_tool(
tool_call.function.name,
tool_call.function.arguments
)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
else:
# LLM provides the final answer
return response.content
The Importance of Tool Descriptions
Good tool descriptions are key to successful tool use. LLMs rely on tool descriptions to decide when to use which tool.
# Good tool description
good_tool = {
"name": "search_knowledge_base",
"description": "Search the internal knowledge base. Use when the user asks about company policies, product documentation, or technical specifications. Not suitable for general knowledge questions.",
"parameters": {
"query": {
"type": "string",
"description": "Search query in natural language describing the information to find"
},
"category": {
"type": "string",
"enum": ["policy", "product", "technical"],
"description": "Search category to help narrow the scope"
}
}
}
# Poor tool description
bad_tool = {
"name": "search",
"description": "Search",
"parameters": {
"q": {"type": "string"}
}
}
Chapter Structure
- Function Calling Mechanisms - Function calling protocols across platforms
- MCP and Tool Protocols - Model Context Protocol and tool standardization
- Code Execution and Sandboxing - Secure code execution environments
- Browser and Computer Operations - GUI interaction capabilities
- API Orchestration and Tool Selection - Tool selection and orchestration strategies
References
- Schick, T., et al. (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools"
- Nakano, R., et al. (2021). "WebGPT: Browser-assisted question-answering with human feedback"
- Shen, Y., et al. (2023). "HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face"
- Patil, S. G., et al. (2023). "Gorilla: Large Language Model Connected with Massive APIs"
- Qin, Y., et al. (2024). "Tool Learning with Large Language Models: A Survey"