Code Execution and Sandboxing
Introduction
Code execution is one of the most powerful tools available to agents. By writing and running code, agents can perform precise calculations, data analysis, file processing, and more. However, executing arbitrary code poses serious security risks, making sandbox technology an indispensable safety measure.
Code Interpreters
OpenAI Code Interpreter
OpenAI provides a built-in code interpreter in ChatGPT and the Assistants API:
from openai import OpenAI
client = OpenAI()
# Using Code Interpreter in the Assistants API
assistant = client.beta.assistants.create(
name="Data Analysis Assistant",
instructions="You are a data analysis assistant skilled in Python data analysis and visualization.",
model="gpt-4o",
tools=[{"type": "code_interpreter"}]
)
# Upload a file
file = client.files.create(
file=open("data.csv", "rb"),
purpose="assistants"
)
# Create a conversation with the file
thread = client.beta.threads.create()
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Analyze this data and generate visualizations",
attachments=[{"file_id": file.id, "tools": [{"type": "code_interpreter"}]}]
)
# Run
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=assistant.id,
)
Features:
- Pre-installed Python with common libraries (pandas, numpy, matplotlib, etc.)
- Automatically sandboxed with no external network access
- Supports file upload and download
- Can generate charts and files
Claude Code Execution
Claude supports direct code execution in agent mode:
# Claude Code (CLI tool) executes code directly on the local machine
# via Bash tool and file operation tools
# In the Anthropic API, tool_use can define code execution tools
tools = [
{
"name": "execute_python",
"description": "Execute Python code and return results",
"input_schema": {
"type": "object",
"properties": {
"code": {
"type": "string",
"description": "Python code to execute"
}
},
"required": ["code"]
}
}
]
Sandbox Technologies
Why Sandboxing Is Needed
Unrestricted code execution can lead to:
- Filesystem damage (deleting files, overwriting system files)
- Network attacks (sending malicious requests)
- Resource exhaustion (infinite loops, memory overflow)
- Data leakage (reading sensitive files)
- Privilege escalation (exploiting system vulnerabilities)
Sandbox Levels
| Level | Isolation | Technology | Overhead | Security |
|---|---|---|---|---|
| Process-level | Low | subprocess + restrictions | Very low | Low |
| Container-level | Medium | Docker | Low | Medium |
| Micro-VM | High | Firecracker/gVisor | Medium | High |
| Full VM | Highest | QEMU/KVM | High | Highest |
Docker Sandbox
import docker
import tempfile
class DockerSandbox:
def __init__(self, image="python:3.11-slim", timeout=30):
self.client = docker.from_env()
self.image = image
self.timeout = timeout
def execute(self, code: str) -> dict:
"""Safely execute code in a Docker container"""
# Write to a temporary file
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(code)
code_path = f.name
try:
container = self.client.containers.run(
self.image,
command=f"python /code/script.py",
volumes={code_path: {"bind": "/code/script.py", "mode": "ro"}},
# Security restrictions
network_disabled=True, # Disable network
mem_limit="256m", # Memory limit
cpu_period=100000,
cpu_quota=50000, # CPU limit (50%)
read_only=True, # Read-only filesystem
tmpfs={"/tmp": "size=64m"}, # Writable temp directory
detach=True,
)
# Wait for execution to complete
result = container.wait(timeout=self.timeout)
logs = container.logs().decode("utf-8")
return {
"exit_code": result["StatusCode"],
"output": logs,
"error": None if result["StatusCode"] == 0 else logs,
}
except Exception as e:
return {"exit_code": -1, "output": "", "error": str(e)}
finally:
try:
container.remove(force=True)
except:
pass
E2B Sandbox
E2B provides cloud-based sandbox environments designed specifically for AI agents:
from e2b_code_interpreter import Sandbox
# Create sandbox
sandbox = Sandbox()
# Execute code
execution = sandbox.run_code("""
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'x': range(10),
'y': [i**2 for i in range(10)]
})
plt.figure(figsize=(8, 6))
plt.plot(df['x'], df['y'])
plt.title('y = x²')
plt.savefig('/tmp/plot.png')
print(df.describe())
""")
print(execution.text) # Standard output
print(execution.error) # Error messages
print(execution.results) # Includes charts and other results
# Download generated files
content = sandbox.files.read("/tmp/plot.png")
# Close sandbox
sandbox.close()
E2B Advantages:
- Cloud-based execution, no impact on the local environment
- Millisecond-level startup
- Supports persistent filesystem
- Pre-installed common Python packages
- Provides custom templates
Modal Serverless Execution
import modal
app = modal.App("agent-sandbox")
@app.function(
image=modal.Image.debian_slim().pip_install("pandas", "numpy", "matplotlib"),
timeout=60,
memory=512,
)
def execute_code(code: str) -> str:
"""Execute code in a Modal serverless environment"""
import io
import sys
# Capture output
output = io.StringIO()
sys.stdout = output
try:
exec(code)
return output.getvalue()
except Exception as e:
return f"Error: {str(e)}"
finally:
sys.stdout = sys.__stdout__
Security Best Practices
Defense in Depth
class SecureCodeExecutor:
def __init__(self):
self.forbidden_modules = [
"os", "sys", "subprocess", "shutil",
"socket", "http", "urllib", "requests",
"ctypes", "importlib",
]
self.forbidden_builtins = [
"exec", "eval", "compile", "__import__",
"open", "input",
]
def static_analysis(self, code: str) -> list:
"""Static analysis: check for dangerous code patterns"""
warnings = []
for module in self.forbidden_modules:
if f"import {module}" in code or f"from {module}" in code:
warnings.append(f"Forbidden module import: {module}")
for builtin in self.forbidden_builtins:
if f"{builtin}(" in code:
warnings.append(f"Forbidden built-in function: {builtin}")
if "while True" in code or "while 1" in code:
warnings.append("Potential infinite loop detected")
return warnings
def execute(self, code: str) -> dict:
"""Securely execute code"""
# 1. Static analysis
warnings = self.static_analysis(code)
if warnings:
return {"error": f"Security check failed: {warnings}"}
# 2. Execute in sandbox
sandbox = DockerSandbox(timeout=30)
return sandbox.execute(code)
Security Checklist
| Measure | Description |
|---|---|
| Network isolation | Disable container network access |
| Filesystem restriction | Read-only + limited temp space |
| Resource limits | CPU, memory, disk I/O caps |
| Timeout control | Hard limit on execution time |
| Module whitelist | Only allow safe Python packages |
| Output limits | Limit output size to prevent data leakage |
| Audit logs | Log all executed code and results |
| User confirmation | Require human approval for high-risk operations |
Applications of Code Execution in Agents
Data Analysis Agent
DATA_ANALYSIS_SYSTEM = """
You are a data analysis assistant. When the user provides data or analysis requirements:
1. Write Python code for analysis
2. Use pandas for data processing
3. Use matplotlib/seaborn for visualization
4. Explain the analysis results
Available libraries: pandas, numpy, matplotlib, seaborn, scipy, sklearn
"""
Mathematical Computation Agent
MATH_SYSTEM = """
When precise computation is needed, write Python code to execute it.
Do not attempt mental arithmetic; always verify with code.
Example:
User: Calculate 2^100
Code: print(2**100)
"""
Further Reading
- Code Generation Agents - Agent applications in software development
- E2B Documentation (e2b.dev)
- Modal Documentation (modal.com)
- Docker Security Best Practices