Skip to content

Overview of Tool Use

Introduction

Although language models excel at text generation and reasoning, they have inherent limitations in precise computation, real-time information retrieval, and code execution. Tool Use enables agents to invoke external functions and APIs, vastly expanding their capability boundaries.

The Symbol Grounding Problem

Harnad (1990) posed the question: how do symbols in a symbolic system acquire meaning? LLMs process tokens (symbols), but real-world operations require interaction with the physical world and digital systems. Tool use is a key pathway for LLMs to achieve "grounding."

graph LR
    subgraph "LLM without Tools"
        A[Language Ability] --> B[Text Generation]
        A --> C[Reasoning]
        A --> D[Knowledge Recall]
    end

    subgraph "Tool-Augmented Agent"
        E[Language Ability] --> F[Text Generation]
        E --> G[Reasoning]
        E --> H[Knowledge Recall]
        E --> I[Tool Invocation]
        I --> J[Code Execution]
        I --> K[API Calls]
        I --> L[Database Queries]
        I --> M[Browser Operations]
        I --> N[File I/O]
    end

Development History

Early Explorations

Date Work Contribution
2021 WebGPT (OpenAI) LLM learns to use search engines
2022 LaMDA (Google) Tool invocation within dialogue
2023.02 Toolformer (Meta) LLM self-learns when and how to use tools
2023.03 HuggingGPT LLM as a controller dispatching AI models
2023.03 ChatGPT Plugins Commercialized tool ecosystem
2023.06 Function Calling (OpenAI) Standardized function calling protocol
2023.10 Gorilla Fine-tuned model for large-scale API calls
2024.03 Claude Tool Use (Anthropic) Native tool use support
2024.11 MCP (Anthropic) Model Context Protocol standard

Core Ideas of Toolformer

Schick et al. (2023) introduced Toolformer, which enables LLMs to autonomously learn tool usage:

  1. Insert API call annotations at positions in training text where tools might be needed
  2. Execute API calls to obtain results
  3. Retain only useful API calls (those that reduce perplexity)
  4. Fine-tune the model on this data

Why Tools Matter

Inherent Limitations of LLMs

Limitation Example Tool Solution
Imprecise math Errors in large number multiplication Calculator / code executor
Knowledge cutoff Unaware of latest news Search engine / API
Cannot perform actions Cannot send emails Email API
No access to private data Unaware of internal company info Database queries
Limited multimodal ability Image generation DALL-E / Stable Diffusion API

Classification of Tool Use

graph TB
    TU[Tool Use] --> INFO[Information Retrieval]
    TU --> ACT[Action Execution]
    TU --> COMP[Computation Augmentation]
    TU --> CREATE[Content Creation]

    INFO --> S[Search Engines]
    INFO --> DB[Database Queries]
    INFO --> API1[Information APIs]

    ACT --> EMAIL[Send Emails]
    ACT --> FILE[File Operations]
    ACT --> DEPLOY[Deploy Services]

    COMP --> CODE[Code Execution]
    COMP --> CALC[Calculator]
    COMP --> DATA[Data Analysis]

    CREATE --> IMG[Image Generation]
    CREATE --> DOC[Document Generation]
    CREATE --> CHART[Chart Drawing]

Basic Workflow of Tool Use

1. Understand Task → 2. Plan Tool Usage → 3. Select Tool → 4. Construct Parameters → 5. Execute Call → 6. Parse Result → 7. Integrate into Response
# Basic tool use loop
def agent_loop(query, tools, llm):
    messages = [{"role": "user", "content": query}]

    while True:
        response = llm.chat(messages, tools=tools)

        if response.finish_reason == "tool_calls":
            # LLM decides to call tools
            for tool_call in response.tool_calls:
                result = execute_tool(
                    tool_call.function.name,
                    tool_call.function.arguments
                )
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": str(result)
                })
        else:
            # LLM provides the final answer
            return response.content

The Importance of Tool Descriptions

Good tool descriptions are key to successful tool use. LLMs rely on tool descriptions to decide when to use which tool.

# Good tool description
good_tool = {
    "name": "search_knowledge_base",
    "description": "Search the internal knowledge base. Use when the user asks about company policies, product documentation, or technical specifications. Not suitable for general knowledge questions.",
    "parameters": {
        "query": {
            "type": "string",
            "description": "Search query in natural language describing the information to find"
        },
        "category": {
            "type": "string",
            "enum": ["policy", "product", "technical"],
            "description": "Search category to help narrow the scope"
        }
    }
}

# Poor tool description
bad_tool = {
    "name": "search",
    "description": "Search",
    "parameters": {
        "q": {"type": "string"}
    }
}

Chapter Structure

  1. Function Calling Mechanisms - Function calling protocols across platforms
  2. MCP and Tool Protocols - Model Context Protocol and tool standardization
  3. Code Execution and Sandboxing - Secure code execution environments
  4. Browser and Computer Operations - GUI interaction capabilities
  5. API Orchestration and Tool Selection - Tool selection and orchestration strategies

References

  • Schick, T., et al. (2023). "Toolformer: Language Models Can Teach Themselves to Use Tools"
  • Nakano, R., et al. (2021). "WebGPT: Browser-assisted question-answering with human feedback"
  • Shen, Y., et al. (2023). "HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face"
  • Patil, S. G., et al. (2023). "Gorilla: Large Language Model Connected with Massive APIs"
  • Qin, Y., et al. (2024). "Tool Learning with Large Language Models: A Survey"

评论 #