Security and Sandboxing
Overview
Security is the most critical consideration in AI Agent deployment. Agents can execute code, call APIs, and operate file systems, meaning security vulnerabilities could lead to severe consequences. This section systematically discusses agent sandboxing technology, permission models, prompt injection defense, and data protection strategies.
Sandboxing Strategies
Sandbox Technology Comparison
| Technology | Isolation Level | Performance Overhead | Security | Applicable Scenario |
|---|---|---|---|---|
| Docker | Container-level | Low | Medium | General scenarios |
| gVisor | Kernel-level | Medium | High | High security needs |
| E2B | Micro-VM | Medium | High | Code execution |
| Firecracker | Micro-VM | Low | High | AWS Lambda |
| WebAssembly | In-process | Very low | Medium | Lightweight isolation |
Docker Sandbox
# Secure Agent code execution sandbox
FROM python:3.11-slim
# Security hardening
RUN apt-get update && apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# Non-root user
RUN useradd -m -s /bin/bash sandbox
USER sandbox
WORKDIR /home/sandbox
# Network access restricted (via Docker network policies)
# File system access restricted (read-only mounts)
# Resource usage limited (Docker resource limits)
Launch parameters:
docker run \
--memory=512m \
--cpus=1 \
--network=none \ # Disable network access
--read-only \ # Read-only file system
--tmpfs /tmp:size=100m \ # Temporary write space
--security-opt no-new-privileges \
--pids-limit=50 \ # Limit process count
agent-sandbox
gVisor
gVisor provides an application-level kernel that intercepts all system calls:
graph TD
A[Agent Code] --> B[gVisor Sentry]
B --> C{System Call}
C -->|Allowed| D[gVisor Kernel Implementation]
C -->|Denied| E[Return Error]
D --> F[Host Kernel]
style B fill:#fff3e0
style E fill:#ffcdd2
Advantages:
- Stronger isolation than Docker (implements a Linux kernel subset)
- Compatible with existing container images
- Used by default in Google Cloud Run
E2B (Code Interpreter SDK)
A code execution sandbox designed specifically for AI Agents:
from e2b_code_interpreter import Sandbox
sandbox = Sandbox()
# Securely execute agent-generated code
execution = sandbox.run_code("""
import pandas as pd
df = pd.read_csv('data.csv')
print(df.describe())
""")
print(execution.logs)
sandbox.close()
Features:
- Creates a fresh micro-VM for each execution
- Supports file upload and download
- Built-in timeout and resource limits
- Supports Python, JavaScript, R, and other languages
Permission Model
Principle of Least Privilege
Agents should only possess the minimum permissions needed to complete a task:
Capability-based Permissions
# Capability-based permission model
class AgentCapabilities:
def __init__(self):
self.capabilities = {
"file_read": {
"allowed_paths": ["/workspace/*"],
"denied_paths": ["/etc/*", "/root/*"],
},
"file_write": {
"allowed_paths": ["/workspace/output/*"],
"max_file_size": "10MB",
},
"network": {
"allowed_domains": ["api.openai.com", "pypi.org"],
"denied_ports": [22, 23, 3389],
},
"code_execution": {
"allowed_languages": ["python"],
"timeout": 30, # seconds
"max_memory": "512MB",
},
}
def check_permission(self, action, resource):
cap = self.capabilities.get(action)
if cap is None:
return False # Default deny
return self._match_resource(cap, resource)
Tiered Permissions
| Permission Level | Allowed Operations | Approval Required |
|---|---|---|
| Level 0 | Read-only (search, read) | None |
| Level 1 | Create (new files, new messages) | None |
| Level 2 | Modify (edit files, update data) | First-time confirmation |
| Level 3 | Delete/Send (delete files, send emails) | Every-time confirmation |
| Level 4 | System operations (install software, modify config) | Human approval |
Prompt Injection Defense
Attack Types
graph TD
A[Prompt Injection Attacks] --> B[Direct Injection]
A --> C[Indirect Injection]
B --> B1[User directly inputs malicious instructions]
C --> C1[Web page content contains hidden instructions]
C --> C2[Documents embed malicious text]
C --> C3[Tool return values contain injection]
style A fill:#ffcdd2
style C fill:#fff3e0
Input Validation
class InputValidator:
# Suspicious pattern detection
SUSPICIOUS_PATTERNS = [
r"ignore\s+(previous|above|all)\s+instructions",
r"you\s+are\s+now\s+",
r"new\s+instructions?\s*:",
r"system\s*prompt\s*:",
r"</?(system|user|assistant)>",
]
def validate(self, user_input: str) -> tuple[bool, str]:
for pattern in self.SUSPICIOUS_PATTERNS:
if re.search(pattern, user_input, re.IGNORECASE):
return False, f"Suspicious pattern detected: {pattern}"
return True, "OK"
Output Filtering
Security checks before agent output reaches the user:
- PII detection: Detect and redact personally identifiable information
- Content policy: Filter harmful or non-compliant content
- Format validation: Ensure output format meets expectations
- Link checking: Verify the safety of generated URLs
Defense Strategy Summary
| Strategy | Defense Layer | Implementation Difficulty | Effectiveness |
|---|---|---|---|
| Input validation | Input layer | Low | Medium |
| System prompt hardening | Prompt layer | Low | Medium |
| Output filtering | Output layer | Medium | High |
| Tool permission control | Execution layer | Medium | High |
| Sandbox isolation | Infrastructure layer | High | High |
| Multi-model review | Review layer | High | Very high |
PII Protection
Personal Information Types
| Type | Example | Risk Level |
|---|---|---|
| Name | John Smith | Medium |
| National ID | 110101199001011234 | High |
| Phone number | 13800138000 | High |
| Email address | user@example.com | Medium |
| Bank card number | 6222021234567890 | Very high |
| Address | 123 Main Street... | Medium |
Redaction Strategy
# PII redaction example
def redact_pii(text):
# Phone numbers
text = re.sub(r'1[3-9]\d{9}', '[PHONE]', text)
# National ID numbers
text = re.sub(r'\d{17}[\dXx]', '[ID_NUMBER]', text)
# Email addresses
text = re.sub(r'\S+@\S+\.\S+', '[EMAIL]', text)
# Bank card numbers
text = re.sub(r'\d{16,19}', '[CARD_NUMBER]', text)
return text
Content Filtering
Multi-layer Filtering Architecture
User Input → Input Filter → Agent Processing → Output Filter → User
↓ ↓
Reject/Modify Redact/Filter
Filtering Rules
- Harmful content: Violence, hate speech, pornography
- Illegal content: Criminal solicitation, fraud
- Enterprise policies: Competitor information, confidential data
- Compliance requirements: Industry-specific content restrictions
Security Best Practices
- Defense in depth: Multiple security layers, not relying on a single line of defense
- Default deny: Operations not explicitly allowed are denied by default
- Audit logging: Record all agent operations for post-hoc review
- Regular penetration testing: Conduct periodic security assessments
- Emergency stop: Implement kill switch mechanisms
- Security updates: Promptly update dependencies and security patches
References
- Greshake, K., et al. "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection." AISec 2023.
- E2B. "Code Interpreter SDK." 2024.
- Google. "gVisor: Container Runtime Sandbox." 2024.
Cross-references: - Code execution sandbox → Code Execution and Sandboxing - Reliability → Reliability and Robustness - Alignment safety → Alignment and Safety Strategies