代码执行与沙箱

引言

代码执行是 Agent 最强大的工具之一。通过编写和运行代码，Agent 能够进行精确计算、数据分析、文件处理等操作。然而，执行任意代码带来严重的安全风险，因此沙箱（Sandbox）技术是不可或缺的安全保障。

代码解释器（Code Interpreter）

OpenAI Code Interpreter

OpenAI 在 ChatGPT 和 Assistants API 中提供了内置的代码解释器：

from openai import OpenAI

client = OpenAI()

# Assistants API 中使用 Code Interpreter
assistant = client.beta.assistants.create(
    name="数据分析助手",
    instructions="你是一个数据分析助手，善于用 Python 进行数据分析和可视化。",
    model="gpt-4o",
    tools=[{"type": "code_interpreter"}]
)

# 上传文件
file = client.files.create(
    file=open("data.csv", "rb"),
    purpose="assistants"
)

# 创建带文件的对话
thread = client.beta.threads.create()
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="分析这份数据，生成可视化图表",
    attachments=[{"file_id": file.id, "tools": [{"type": "code_interpreter"}]}]
)

# 运行
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

特点：

预装 Python 及常用库（pandas, numpy, matplotlib 等）
自动沙箱化，无法访问外部网络
支持文件上传和下载
可以生成图表和文件

Claude 代码执行

Claude 在 Agent 模式下支持直接执行代码：

# Claude Code（CLI 工具）直接在本地执行代码
# 通过 Bash 工具和文件操作工具实现

# Anthropic API 中的 tool_use 可以定义代码执行工具
tools = [
    {
        "name": "execute_python",
        "description": "执行 Python 代码并返回结果",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "要执行的 Python 代码"
                }
            },
            "required": ["code"]
        }
    }
]

沙箱技术

为什么需要沙箱

不受限的代码执行可能导致：

文件系统破坏（删除文件、覆盖系统文件）
网络攻击（发送恶意请求）
资源耗尽（无限循环、内存爆炸）
数据泄露（读取敏感文件）
权限提升（利用系统漏洞）

沙箱级别

级别	隔离程度	技术	开销	安全性
进程级	低	subprocess + 限制	极低	低
容器级	中	Docker	低	中
微虚拟机	高	Firecracker/gVisor	中	高
完整虚拟机	最高	QEMU/KVM	高	最高

Docker 沙箱

import docker
import tempfile

class DockerSandbox:
    def __init__(self, image="python:3.11-slim", timeout=30):
        self.client = docker.from_env()
        self.image = image
        self.timeout = timeout

    def execute(self, code: str) -> dict:
        """在 Docker 容器中安全执行代码"""
        # 写入临时文件
        with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
            f.write(code)
            code_path = f.name

        try:
            container = self.client.containers.run(
                self.image,
                command=f"python /code/script.py",
                volumes={code_path: {"bind": "/code/script.py", "mode": "ro"}},
                # 安全限制
                network_disabled=True,     # 禁用网络
                mem_limit="256m",          # 内存限制
                cpu_period=100000,
                cpu_quota=50000,           # CPU 限制（50%）
                read_only=True,            # 只读文件系统
                tmpfs={"/tmp": "size=64m"},  # 可写临时目录
                detach=True,
            )

            # 等待执行完成
            result = container.wait(timeout=self.timeout)
            logs = container.logs().decode("utf-8")

            return {
                "exit_code": result["StatusCode"],
                "output": logs,
                "error": None if result["StatusCode"] == 0 else logs,
            }
        except Exception as e:
            return {"exit_code": -1, "output": "", "error": str(e)}
        finally:
            try:
                container.remove(force=True)
            except:
                pass

E2B 沙箱

E2B 提供云端的沙箱环境，专为 AI Agent 设计：

from e2b_code_interpreter import Sandbox

# 创建沙箱
sandbox = Sandbox()

# 执行代码
execution = sandbox.run_code("""
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'x': range(10),
    'y': [i**2 for i in range(10)]
})

plt.figure(figsize=(8, 6))
plt.plot(df['x'], df['y'])
plt.title('y = x²')
plt.savefig('/tmp/plot.png')
print(df.describe())
""")

print(execution.text)    # 标准输出
print(execution.error)   # 错误信息
print(execution.results) # 包含图表等结果

# 下载生成的文件
content = sandbox.files.read("/tmp/plot.png")

# 关闭沙箱
sandbox.close()

E2B 优势：

云端运行，不影响本地环境
毫秒级启动
支持持久化的文件系统
预装常用 Python 包
提供自定义模板

import modal

app = modal.App("agent-sandbox")

@app.function(
    image=modal.Image.debian_slim().pip_install("pandas", "numpy", "matplotlib"),
    timeout=60,
    memory=512,
)
def execute_code(code: str) -> str:
    """在 Modal 无服务器环境中执行代码"""
    import io
    import sys

    # 捕获输出
    output = io.StringIO()
    sys.stdout = output

    try:
        exec(code)
        return output.getvalue()
    except Exception as e:
        return f"Error: {str(e)}"
    finally:
        sys.stdout = sys.__stdout__

安全最佳实践

多层防御

class SecureCodeExecutor:
    def __init__(self):
        self.forbidden_modules = [
            "os", "sys", "subprocess", "shutil", 
            "socket", "http", "urllib", "requests",
            "ctypes", "importlib",
        ]
        self.forbidden_builtins = [
            "exec", "eval", "compile", "__import__",
            "open", "input",
        ]

    def static_analysis(self, code: str) -> list:
        """静态分析：检查危险代码模式"""
        warnings = []

        for module in self.forbidden_modules:
            if f"import {module}" in code or f"from {module}" in code:
                warnings.append(f"禁止导入模块: {module}")

        for builtin in self.forbidden_builtins:
            if f"{builtin}(" in code:
                warnings.append(f"禁止使用内置函数: {builtin}")

        if "while True" in code or "while 1" in code:
            warnings.append("检测到潜在的无限循环")

        return warnings

    def execute(self, code: str) -> dict:
        """安全执行代码"""
        # 1. 静态分析
        warnings = self.static_analysis(code)
        if warnings:
            return {"error": f"安全检查未通过: {warnings}"}

        # 2. 在沙箱中执行
        sandbox = DockerSandbox(timeout=30)
        return sandbox.execute(code)

安全清单

措施	说明
网络隔离	禁用容器网络访问
文件系统限制	只读 + 有限的临时空间
资源限制	CPU、内存、磁盘 I/O 上限
超时控制	执行时间硬限制
模块白名单	只允许安全的 Python 包
输出限制	限制输出大小防止信息泄露
审计日志	记录所有执行的代码和结果
用户确认	高风险操作需要人工确认

代码执行在 Agent 中的应用

数据分析 Agent

DATA_ANALYSIS_SYSTEM = """
你是一个数据分析助手。当用户提供数据或分析需求时：
1. 编写 Python 代码进行分析
2. 使用 pandas 处理数据
3. 使用 matplotlib/seaborn 生成可视化
4. 解释分析结果

你可以使用的库：pandas, numpy, matplotlib, seaborn, scipy, sklearn
"""

数学计算 Agent

MATH_SYSTEM = """
当需要进行精确计算时，编写 Python 代码执行。
不要尝试心算，始终使用代码验证。

示例：
用户: 计算 2^100
代码: print(2**100)
"""

延伸阅读

代码生成智能体 - Agent 在软件开发中的应用
E2B Documentation (e2b.dev)
Modal Documentation (modal.com)
Docker Security Best Practices