Code Execution and Sandboxing

Introduction

Code execution is one of the most powerful tools available to agents. By writing and running code, agents can perform precise calculations, data analysis, file processing, and more. However, executing arbitrary code poses serious security risks, making sandbox technology an indispensable safety measure.

Code Interpreters

OpenAI Code Interpreter

OpenAI provides a built-in code interpreter in ChatGPT and the Assistants API:

from openai import OpenAI

client = OpenAI()

# Using Code Interpreter in the Assistants API
assistant = client.beta.assistants.create(
    name="Data Analysis Assistant",
    instructions="You are a data analysis assistant skilled in Python data analysis and visualization.",
    model="gpt-4o",
    tools=[{"type": "code_interpreter"}]
)

# Upload a file
file = client.files.create(
    file=open("data.csv", "rb"),
    purpose="assistants"
)

# Create a conversation with the file
thread = client.beta.threads.create()
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Analyze this data and generate visualizations",
    attachments=[{"file_id": file.id, "tools": [{"type": "code_interpreter"}]}]
)

# Run
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

Features:

Pre-installed Python with common libraries (pandas, numpy, matplotlib, etc.)
Automatically sandboxed with no external network access
Supports file upload and download
Can generate charts and files

Claude Code Execution

Claude supports direct code execution in agent mode:

# Claude Code (CLI tool) executes code directly on the local machine
# via Bash tool and file operation tools

# In the Anthropic API, tool_use can define code execution tools
tools = [
    {
        "name": "execute_python",
        "description": "Execute Python code and return results",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "Python code to execute"
                }
            },
            "required": ["code"]
        }
    }
]

Sandbox Technologies

Why Sandboxing Is Needed

Unrestricted code execution can lead to:

Filesystem damage (deleting files, overwriting system files)
Network attacks (sending malicious requests)
Resource exhaustion (infinite loops, memory overflow)
Data leakage (reading sensitive files)
Privilege escalation (exploiting system vulnerabilities)

Sandbox Levels

Level	Isolation	Technology	Overhead	Security
Process-level	Low	subprocess + restrictions	Very low	Low
Container-level	Medium	Docker	Low	Medium
Micro-VM	High	Firecracker/gVisor	Medium	High
Full VM	Highest	QEMU/KVM	High	Highest

Docker Sandbox

import docker
import tempfile

class DockerSandbox:
    def __init__(self, image="python:3.11-slim", timeout=30):
        self.client = docker.from_env()
        self.image = image
        self.timeout = timeout

    def execute(self, code: str) -> dict:
        """Safely execute code in a Docker container"""
        # Write to a temporary file
        with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
            f.write(code)
            code_path = f.name

        try:
            container = self.client.containers.run(
                self.image,
                command=f"python /code/script.py",
                volumes={code_path: {"bind": "/code/script.py", "mode": "ro"}},
                # Security restrictions
                network_disabled=True,     # Disable network
                mem_limit="256m",          # Memory limit
                cpu_period=100000,
                cpu_quota=50000,           # CPU limit (50%)
                read_only=True,            # Read-only filesystem
                tmpfs={"/tmp": "size=64m"},  # Writable temp directory
                detach=True,
            )

            # Wait for execution to complete
            result = container.wait(timeout=self.timeout)
            logs = container.logs().decode("utf-8")

            return {
                "exit_code": result["StatusCode"],
                "output": logs,
                "error": None if result["StatusCode"] == 0 else logs,
            }
        except Exception as e:
            return {"exit_code": -1, "output": "", "error": str(e)}
        finally:
            try:
                container.remove(force=True)
            except:
                pass

E2B Sandbox

E2B provides cloud-based sandbox environments designed specifically for AI agents:

from e2b_code_interpreter import Sandbox

# Create sandbox
sandbox = Sandbox()

# Execute code
execution = sandbox.run_code("""
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'x': range(10),
    'y': [i**2 for i in range(10)]
})

plt.figure(figsize=(8, 6))
plt.plot(df['x'], df['y'])
plt.title('y = x²')
plt.savefig('/tmp/plot.png')
print(df.describe())
""")

print(execution.text)    # Standard output
print(execution.error)   # Error messages
print(execution.results) # Includes charts and other results

# Download generated files
content = sandbox.files.read("/tmp/plot.png")

# Close sandbox
sandbox.close()

E2B Advantages:

Cloud-based execution, no impact on the local environment
Millisecond-level startup
Supports persistent filesystem
Pre-installed common Python packages
Provides custom templates

import modal

app = modal.App("agent-sandbox")

@app.function(
    image=modal.Image.debian_slim().pip_install("pandas", "numpy", "matplotlib"),
    timeout=60,
    memory=512,
)
def execute_code(code: str) -> str:
    """Execute code in a Modal serverless environment"""
    import io
    import sys

    # Capture output
    output = io.StringIO()
    sys.stdout = output

    try:
        exec(code)
        return output.getvalue()
    except Exception as e:
        return f"Error: {str(e)}"
    finally:
        sys.stdout = sys.__stdout__

Security Best Practices

Defense in Depth

class SecureCodeExecutor:
    def __init__(self):
        self.forbidden_modules = [
            "os", "sys", "subprocess", "shutil", 
            "socket", "http", "urllib", "requests",
            "ctypes", "importlib",
        ]
        self.forbidden_builtins = [
            "exec", "eval", "compile", "__import__",
            "open", "input",
        ]

    def static_analysis(self, code: str) -> list:
        """Static analysis: check for dangerous code patterns"""
        warnings = []

        for module in self.forbidden_modules:
            if f"import {module}" in code or f"from {module}" in code:
                warnings.append(f"Forbidden module import: {module}")

        for builtin in self.forbidden_builtins:
            if f"{builtin}(" in code:
                warnings.append(f"Forbidden built-in function: {builtin}")

        if "while True" in code or "while 1" in code:
            warnings.append("Potential infinite loop detected")

        return warnings

    def execute(self, code: str) -> dict:
        """Securely execute code"""
        # 1. Static analysis
        warnings = self.static_analysis(code)
        if warnings:
            return {"error": f"Security check failed: {warnings}"}

        # 2. Execute in sandbox
        sandbox = DockerSandbox(timeout=30)
        return sandbox.execute(code)

Security Checklist

Measure	Description
Network isolation	Disable container network access
Filesystem restriction	Read-only + limited temp space
Resource limits	CPU, memory, disk I/O caps
Timeout control	Hard limit on execution time
Module whitelist	Only allow safe Python packages
Output limits	Limit output size to prevent data leakage
Audit logs	Log all executed code and results
User confirmation	Require human approval for high-risk operations

Applications of Code Execution in Agents

Data Analysis Agent

DATA_ANALYSIS_SYSTEM = """
You are a data analysis assistant. When the user provides data or analysis requirements:
1. Write Python code for analysis
2. Use pandas for data processing
3. Use matplotlib/seaborn for visualization
4. Explain the analysis results

Available libraries: pandas, numpy, matplotlib, seaborn, scipy, sklearn
"""

Mathematical Computation Agent

MATH_SYSTEM = """
When precise computation is needed, write Python code to execute it.
Do not attempt mental arithmetic; always verify with code.

Example:
User: Calculate 2^100
Code: print(2**100)
"""

Code Execution and Sandboxing

Introduction

Code Interpreters

OpenAI Code Interpreter

Claude Code Execution

Sandbox Technologies

Why Sandboxing Is Needed

Sandbox Levels

Docker Sandbox

E2B Sandbox

Security Best Practices

Defense in Depth

Security Checklist

Applications of Code Execution in Agents

Data Analysis Agent

Mathematical Computation Agent

Further Reading

评论 #

Code Execution and Sandboxing

Introduction

Code Interpreters

OpenAI Code Interpreter

Claude Code Execution

Sandbox Technologies

Why Sandboxing Is Needed

Sandbox Levels

Docker Sandbox

E2B Sandbox

Modal Serverless Execution

Security Best Practices

Defense in Depth

Security Checklist

Applications of Code Execution in Agents

Data Analysis Agent

Mathematical Computation Agent

Further Reading

评论 #