API Orchestration and Tool Selection
Introduction
When an agent has access to a large number of tools, selecting the right tool, calling them in the right order, and handling errors during invocation becomes a critical engineering challenge. This section explores tool selection algorithms, orchestration patterns, and error handling strategies.
The Tool Selection Problem
Challenges
- Too many tools: With hundreds of tools, LLMs struggle to process all tool definitions within context
- Semantic overlap: Multiple tools with similar functionality require precise differentiation
- Context consumption: Each tool definition consumes hundreds of tokens
- Selection accuracy: Selecting the wrong tool leads to task failure
Tool Selection Strategies
1. Full Loading (Suitable for Few Tools)
# When tools < 20, pass all directly to the LLM
response = llm.chat(
messages=messages,
tools=all_tools, # All tool definitions
)
2. Semantic Routing
Pre-filter relevant tools based on the semantic content of the user query:
class ToolRouter:
def __init__(self, tools, embedding_model):
self.tools = tools
self.embed = embedding_model
# Pre-compute embeddings for all tool descriptions
self.tool_embeddings = {
tool["name"]: self.embed(tool["description"])
for tool in tools
}
def select_tools(self, query, top_k=5):
"""Select the most relevant tools based on semantic similarity"""
query_embedding = self.embed(query)
scores = {}
for name, emb in self.tool_embeddings.items():
scores[name] = cosine_similarity(query_embedding, emb)
# Return the top_k most relevant tools
sorted_tools = sorted(scores.items(), key=lambda x: x[1], reverse=True)
selected_names = [name for name, score in sorted_tools[:top_k]]
return [t for t in self.tools if t["name"] in selected_names]
# Usage
router = ToolRouter(all_tools, embedding_model)
relevant_tools = router.select_tools("Check the weather in Beijing for me")
response = llm.chat(messages=messages, tools=relevant_tools)
3. Category Routing
Use an LLM to classify first, then load tools from the corresponding category:
TOOL_CATEGORIES = {
"information_retrieval": ["web_search", "knowledge_base_query", "database_query"],
"data_analysis": ["execute_python", "create_chart", "statistical_test"],
"communication": ["send_email", "send_slack", "create_document"],
"file_operations": ["read_file", "write_file", "list_directory"],
}
def category_routing(query, llm):
"""Classify first, then select tools"""
category = llm.classify(
f"Classify the following query into a tool category: {list(TOOL_CATEGORIES.keys())}\n"
f"Query: {query}\nCategory:"
)
return TOOL_CATEGORIES.get(category, [])
4. Two-Stage Selection
Use a lightweight model for initial filtering, then a stronger model for precise selection:
def two_stage_selection(query, all_tools):
# Stage 1: Lightweight model for quick filtering (low cost)
candidates = light_model.select(
query=query,
tools=[{"name": t["name"], "description": t["description"][:100]} for t in all_tools],
top_k=10,
)
# Stage 2: Strong model for precise selection (with full definitions)
selected = strong_model.select(
query=query,
tools=[t for t in all_tools if t["name"] in candidates],
)
return selected
Tool Chaining Patterns
Sequential Chain
# Tool A's output becomes Tool B's input
async def sequential_chain(query):
# Step 1: Search
search_results = await search_tool(query)
# Step 2: Query database with search results
db_results = await database_query(extract_entities(search_results))
# Step 3: Analyze
analysis = await code_executor(generate_analysis_code(db_results))
return analysis
Parallel Fan-out
import asyncio
async def fan_out(query):
"""Call multiple tools in parallel and aggregate results"""
tasks = [
web_search(query),
knowledge_base_search(query),
database_query(query),
]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Filter out errors
valid_results = [r for r in results if not isinstance(r, Exception)]
return merge_results(valid_results)
Conditional Branching
async def conditional_routing(query, context):
"""Choose different tool chains based on conditions"""
intent = classify_intent(query)
if intent == "factual_question":
# Knowledge retrieval chain
docs = await knowledge_base_search(query)
return generate_answer(query, docs)
elif intent == "data_analysis":
# Data analysis chain
data = await database_query(extract_sql(query))
code = generate_analysis_code(data)
return await code_executor(code)
elif intent == "action_request":
# Action execution chain
plan = plan_actions(query)
results = []
for step in plan:
result = await execute_action(step)
results.append(result)
if result.get("error"):
break
return summarize_results(results)
Recursive Tool Use
The agent discovers during execution that additional tool calls are needed:
async def recursive_tool_use(query, tools, llm, depth=0, max_depth=5):
"""Recursive tool invocation"""
if depth >= max_depth:
return "Maximum recursion depth reached"
response = await llm.chat(
messages=[{"role": "user", "content": query}],
tools=tools
)
if not response.tool_calls:
return response.content
# Execute tool calls
results = []
for tc in response.tool_calls:
result = await execute_tool(tc.name, tc.arguments)
results.append(result)
# Feed results back to the LLM, potentially triggering more tool calls
follow_up = format_results(results)
return await recursive_tool_use(follow_up, tools, llm, depth + 1, max_depth)
Error Handling and Retries
Error Classification
| Error Type | Example | Handling Strategy |
|---|---|---|
| Parameter error | Missing required parameter | Let LLM correct the parameters |
| Authentication failure | Expired API key | Refresh credentials and retry |
| Rate limiting | 429 Too Many Requests | Exponential backoff retry |
| Service unavailable | 500 Server Error | Wait and retry, or degrade |
| Logic error | Query returns no results | Rewrite query or switch tools |
| Timeout | Execution takes too long | Retry with timeout or simplify request |
Retry Strategies
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10),
)
async def resilient_tool_call(tool_name, arguments):
"""Tool call with retry logic"""
return await execute_tool(tool_name, arguments)
class SmartRetry:
def __init__(self, llm):
self.llm = llm
async def execute_with_recovery(self, tool_name, arguments, error=None):
"""Intelligent error recovery"""
if error:
# Let LLM analyze the error and correct parameters
fix = await self.llm.generate(
f"Tool {tool_name} call failed.\n"
f"Arguments: {arguments}\n"
f"Error: {error}\n"
f"Please correct the arguments or suggest an alternative."
)
arguments = fix.get("corrected_arguments", arguments)
try:
return await execute_tool(tool_name, arguments)
except Exception as e:
if self.is_retryable(e):
return await self.execute_with_recovery(tool_name, arguments, str(e))
raise
def is_retryable(self, error):
retryable_codes = [429, 500, 502, 503, 504]
return getattr(error, 'status_code', None) in retryable_codes
Fallback Strategies
class ToolWithFallback:
def __init__(self, primary_tool, fallback_tools):
self.primary = primary_tool
self.fallbacks = fallback_tools
async def execute(self, arguments):
"""Try fallback tools when the primary tool fails"""
try:
return await self.primary(arguments)
except Exception as primary_error:
for fallback in self.fallbacks:
try:
return await fallback(arguments)
except:
continue
raise primary_error
# Example: fallback chain for search tools
search = ToolWithFallback(
primary_tool=google_search,
fallback_tools=[bing_search, brave_search, duckduckgo_search]
)
Tool Usage Decision Framework
When to Use Which Tool
TOOL_DECISION_TREE = """
User query →
├─ Need latest information? → web_search
├─ Need internal knowledge? → knowledge_base_search
├─ Need precise computation? → code_interpreter
├─ Need data analysis? → code_interpreter + data_tools
├─ Need to perform an action? →
│ ├─ Send a message? → email/slack_tool
│ ├─ File operation? → file_tools
│ └─ System operation? → bash_tool
├─ Need multi-step reasoning? → Combine multiple tools
└─ Pure knowledge Q&A? → No tools needed, answer directly
"""
Practical Recommendations
Orchestration Checklist
- [ ] Implement tool routing to reduce interference from irrelevant tools
- [ ] Define clear usage scenarios and exclusion conditions for each tool
- [ ] Implement error handling and retry mechanisms
- [ ] Set timeouts and resource limits for tool calls
- [ ] Add logging and monitoring for tool usage
- [ ] Add human confirmation steps for high-risk tools
Performance Optimization
- Call independent tools in parallel to reduce latency
- Cache results from frequently used tools
- Use lightweight models for pre-filtering
- Truncate tool results to avoid wasting context space
Further Reading
- Framework Selection Overview - How agent frameworks handle tool orchestration
- Qin, Y., et al. (2024). "Tool Learning with Large Language Models: A Survey"
- Hao, S., et al. (2024). "ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings"