Overview of Tool Use

Introduction

Although language models excel at text generation and reasoning, they have inherent limitations in precise computation, real-time information retrieval, and code execution. Tool Use enables agents to invoke external functions and APIs, vastly expanding their capability boundaries.

The Symbol Grounding Problem

Harnad (1990) posed the question: how do symbols in a symbolic system acquire meaning? LLMs process tokens (symbols), but real-world operations require interaction with the physical world and digital systems. Tool use is a key pathway for LLMs to achieve "grounding."

graph LR
    subgraph "LLM without Tools"
        A[Language Ability] --> B[Text Generation]
        A --> C[Reasoning]
        A --> D[Knowledge Recall]
    end

    subgraph "Tool-Augmented Agent"
        E[Language Ability] --> F[Text Generation]
        E --> G[Reasoning]
        E --> H[Knowledge Recall]
        E --> I[Tool Invocation]
        I --> J[Code Execution]
        I --> K[API Calls]
        I --> L[Database Queries]
        I --> M[Browser Operations]
        I --> N[File I/O]
    end

Development History

Early Explorations

Date	Work	Contribution
2021	WebGPT (OpenAI)	LLM learns to use search engines
2022	LaMDA (Google)	Tool invocation within dialogue
2023.02	Toolformer (Meta)	LLM self-learns when and how to use tools
2023.03	HuggingGPT	LLM as a controller dispatching AI models
2023.03	ChatGPT Plugins	Commercialized tool ecosystem
2023.06	Function Calling (OpenAI)	Standardized function calling protocol
2023.10	Gorilla	Fine-tuned model for large-scale API calls
2024.03	Claude Tool Use (Anthropic)	Native tool use support
2024.11	MCP (Anthropic)	Model Context Protocol standard

Core Ideas of Toolformer

Schick et al. (2023) introduced Toolformer, which enables LLMs to autonomously learn tool usage:

Insert API call annotations at positions in training text where tools might be needed
Execute API calls to obtain results
Retain only useful API calls (those that reduce perplexity)
Fine-tune the model on this data

Why Tools Matter

Inherent Limitations of LLMs

Limitation	Example	Tool Solution
Imprecise math	Errors in large number multiplication	Calculator / code executor
Knowledge cutoff	Unaware of latest news	Search engine / API
Cannot perform actions	Cannot send emails	Email API
No access to private data	Unaware of internal company info	Database queries
Limited multimodal ability	Image generation	DALL-E / Stable Diffusion API

Classification of Tool Use

graph TB
    TU[Tool Use] --> INFO[Information Retrieval]
    TU --> ACT[Action Execution]
    TU --> COMP[Computation Augmentation]
    TU --> CREATE[Content Creation]

    INFO --> S[Search Engines]
    INFO --> DB[Database Queries]
    INFO --> API1[Information APIs]

    ACT --> EMAIL[Send Emails]
    ACT --> FILE[File Operations]
    ACT --> DEPLOY[Deploy Services]

    COMP --> CODE[Code Execution]
    COMP --> CALC[Calculator]
    COMP --> DATA[Data Analysis]

    CREATE --> IMG[Image Generation]
    CREATE --> DOC[Document Generation]
    CREATE --> CHART[Chart Drawing]

Basic Workflow of Tool Use

1. Understand Task → 2. Plan Tool Usage → 3. Select Tool → 4. Construct Parameters → 5. Execute Call → 6. Parse Result → 7. Integrate into Response

# Basic tool use loop
def agent_loop(query, tools, llm):
    messages = [{"role": "user", "content": query}]

    while True:
        response = llm.chat(messages, tools=tools)

        if response.finish_reason == "tool_calls":
            # LLM decides to call tools
            for tool_call in response.tool_calls:
                result = execute_tool(
                    tool_call.function.name,
                    tool_call.function.arguments
                )
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": str(result)
                })
        else:
            # LLM provides the final answer
            return response.content

The Importance of Tool Descriptions

Good tool descriptions are key to successful tool use. LLMs rely on tool descriptions to decide when to use which tool.

# Good tool description
good_tool = {
    "name": "search_knowledge_base",
    "description": "Search the internal knowledge base. Use when the user asks about company policies, product documentation, or technical specifications. Not suitable for general knowledge questions.",
    "parameters": {
        "query": {
            "type": "string",
            "description": "Search query in natural language describing the information to find"
        },
        "category": {
            "type": "string",
            "enum": ["policy", "product", "technical"],
            "description": "Search category to help narrow the scope"
        }
    }
}

# Poor tool description
bad_tool = {
    "name": "search",
    "description": "Search",
    "parameters": {
        "q": {"type": "string"}
    }
}

Chapter Structure

Function Calling Mechanisms - Function calling protocols across platforms
MCP and Tool Protocols - Model Context Protocol and tool standardization
Code Execution and Sandboxing - Secure code execution environments
Browser and Computer Operations - GUI interaction capabilities
API Orchestration and Tool Selection - Tool selection and orchestration strategies

References

Schick, T., et al. (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools"
Nakano, R., et al. (2021). "WebGPT: Browser-assisted question-answering with human feedback"
Shen, Y., et al. (2023). "HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face"
Patil, S. G., et al. (2023). "Gorilla: Large Language Model Connected with Massive APIs"
Qin, Y., et al. (2024). "Tool Learning with Large Language Models: A Survey"