System Design
Introduction
System design is the process of translating business requirements into scalable, highly available technical architectures. This article introduces system design methodology, core components, and classic case studies.
1. System Design Methodology
1.1 Four-Step Approach
Step 1: Clarify Requirements (5 min)
- Functional requirements: what are the core features?
- Non-functional requirements: QPS, latency, availability, consistency
- Constraints: user scale, data volume, budget
Step 2: Estimation (5 min)
- QPS, storage, bandwidth
- Peak vs average
Step 3: High-Level Design (15 min)
- Core components and data flow
- API design
- Data model
Step 4: Deep Dive (20 min)
- Choose 1-2 core components to explore in depth
- Trade-offs
- Scalability and fault tolerance
1.2 Load Estimation
Common benchmark numbers:
| Metric | Order of Magnitude |
|---|---|
| 1M DAU | QPS ≈ 12 (average), peak ≈ 24-60 |
| One tweet | ~250 bytes (text) |
| One image | ~200 KB (compressed) |
| One video | ~5 MB/min (compressed) |
| SSD read | ~100 μs |
| Memory read | ~100 ns |
| Network round trip (same region) | ~0.5 ms |
| Network round trip (cross-continent) | ~150 ms |
2. Core Components
2.1 Load Balancer
graph LR
C[Client] --> LB[Load Balancer]
LB --> S1[Server 1]
LB --> S2[Server 2]
LB --> S3[Server 3]
L4 vs L7 Load Balancing:
| Layer | Operating Layer | Decision Basis | Performance | Representatives |
|---|---|---|---|---|
| L4 | Transport | IP + Port | High | LVS, AWS NLB |
| L7 | Application | URL, Header, Cookie | Fairly high | Nginx, HAProxy, AWS ALB |
Load balancing strategies:
| Strategy | Description | Use Case |
|---|---|---|
| Round Robin | Assign in turn | Servers with similar performance |
| Weighted Round Robin | Assign by weight | Servers with different performance |
| Least Connections | Assign to least connected | Long connection scenarios |
| IP Hash | Same IP to same server | Session persistence |
| Consistent Hashing | Hash ring | Caching scenarios |
2.2 Reverse Proxy
Client → [Reverse Proxy Nginx] → Backend Server Cluster
Functions:
- Load balancing
- SSL termination
- Static file caching
- Compression (gzip/brotli)
- Rate limiting
- Security (WAF)
2.3 Caching
Cache hierarchy:
Client cache (browser)
↓ miss
CDN cache (edge nodes)
↓ miss
Reverse proxy cache (Nginx)
↓ miss
Application cache (Redis/Memcached)
↓ miss
Database
Common Redis use cases:
import redis
r = redis.Redis(host='localhost', port=6379)
# Cache strategy: Cache-Aside
def get_user(user_id):
# 1. Check cache first
cached = r.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# 2. Cache miss, query database
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
# 3. Write to cache (TTL 5 minutes)
r.setex(f"user:{user_id}", 300, json.dumps(user))
return user
Cache problems:
| Problem | Description | Solution |
|---|---|---|
| Cache penetration | Querying non-existent keys | Bloom filter / cache null values |
| Cache breakdown | Hot key expires, causing a flood of requests | Mutex lock / never expire |
| Cache avalanche | Many keys expire simultaneously | Random TTL / multi-level cache |
CDN (Content Delivery Network):
- Caches static resources at global edge nodes
- Users access the nearest node
- Reduces latency and origin server load
- Representatives: Cloudflare, AWS CloudFront, Akamai
2.4 Message Queue
Asynchronous decoupling and peak shaving:
Synchronous call (coupled):
User → Order Service → Inventory Service → Payment Service → Notification Service
Async messaging (decoupled):
User → Order Service → [Message Queue]
├→ Inventory Service
├→ Payment Service
└→ Notification Service
2.5 Database Sharding
Vertical sharding: split tables into different databases by business domain
User DB: users, profiles
Order DB: orders, order_items
Product DB: products, categories
Horizontal sharding: split the same table by data range
users table sharded by user_id:
Shard 0: user_id 0 - 999,999
Shard 1: user_id 1,000,000 - 1,999,999
Shard 2: user_id 2,000,000 - 2,999,999
Sharding strategies:
| Strategy | Method | Advantages | Disadvantages |
|---|---|---|---|
| Range sharding | By ID range | Simple, range-query friendly | Hot spot issues |
| Hash sharding | hash(key) % N | Uniform distribution | Range queries are difficult |
| Consistent hashing | Hash ring | Minimal impact during scaling | More complex implementation |
2.6 Database Replication
Write requests → [Primary (Master)]
│ Sync/Async replication
┌────┼────┐
▼ ▼ ▼
[Replica 1] [Replica 2] [Replica 3] ← Read requests
3. Consistent Hashing
Solves the problem of massive data migration when adding or removing nodes in distributed systems.
Hash ring (0 ~ 2^32-1):
Node A
/
----●----------
/ key1 \
/ \
| key2 Node B|
\ ● ● /
\ /
----●----------
Node C
key3
Rule: a key is assigned to the first node found clockwise
Adding Node D: only need to migrate keys between Node D and its counterclockwise predecessor
Removing a Node: only need to migrate that node's keys to the next clockwise node
Virtual nodes: each physical node maps to multiple virtual nodes on the hash ring for more uniform data distribution.
4. Rate Limiting
4.1 Algorithms
| Algorithm | Principle | Characteristics |
|---|---|---|
| Token bucket | Tokens generated at fixed rate; requests consume tokens | Allows bursts, average rate limiting |
| Leaky bucket | Requests enter bucket; drain at fixed rate | Strict rate limiting, smooth output |
| Fixed window | Count within fixed time window | Simple, boundary burst issue |
| Sliding window | Count within sliding time window | Precise, higher memory overhead |
4.2 Implementation Example
# Redis sliding window rate limiting
import time
def is_rate_limited(redis_client, user_id, limit=100, window=60):
key = f"rate_limit:{user_id}"
now = time.time()
pipe = redis_client.pipeline()
pipe.zremrangebyscore(key, 0, now - window) # remove expired entries
pipe.zadd(key, {str(now): now}) # add current request
pipe.zcard(key) # count
pipe.expire(key, window) # set expiration
results = pipe.execute()
count = results[2]
return count > limit
5. Case Studies
5.1 URL Shortener Service
Requirements: map long URLs to short URLs with redirect support.
Write: POST /api/shorten {url: "https://very-long-url.com/path"}
→ {short_url: "https://short.ly/abc123"}
Read: GET /abc123
→ 301 Redirect to https://very-long-url.com/path
Estimates:
- Write QPS: 100/s
- Read QPS: 10,000/s (100:1 read-to-write ratio)
- Storage: 100 * 86400 * 365 * 5 years * 500B ≈ 800GB
graph LR
C[Client] --> LB[Load Balancer]
LB --> API[API Service]
API --> Cache[(Redis Cache)]
API --> DB[(Database)]
subgraph Short Code Generation
API --> ID[ID Generator]
ID --> B62[Base62 Encoding]
end
Core design:
- Short code generation: distributed ID generator (Snowflake) → Base62 encoding
- Read optimization: Redis cache for popular URLs (LRU)
- 301 vs 302: 301 permanent redirect (browser caches) vs 302 temporary redirect (enables click tracking)
5.2 Chat System
Requirements: support one-to-one and group chat with delivery notifications.
graph TB
C1[Client A] -->|WebSocket| GW[WebSocket Gateway]
C2[Client B] -->|WebSocket| GW
GW --> MS[Message Service]
MS --> MQ[Message Queue Kafka]
MQ --> PS[Push Service]
MQ --> Store[Storage Service]
PS --> GW
Store --> DB[(Message Database)]
MS --> Session[(Session Management Redis)]
Core design:
- Persistent connections: WebSocket for real-time communication
- Message storage: write fan-out (group chat writes to each member's inbox) vs read fan-out
- Offline messages: pull unread messages when user comes online
- Message ordering: monotonically increasing sequence_id per conversation
5.3 News Feed
Requirements: users publish content that appears in followers' feeds.
Push model vs pull model:
| Model | On Write | On Read | Suited For |
|---|---|---|---|
| Push (Fan-out on write) | Write to all followers' feeds | Read directly | Few followers |
| Pull (Fan-in on read) | Write only to own timeline | Merge all followed users' posts | Many followers (celebrities) |
| Hybrid | Push for regular users, pull for celebrities | Merge pushed + real-time pulled | Production systems |
6. System Architecture Overview
graph TB
Client[Client] --> CDN[CDN]
Client --> DNS[DNS]
CDN --> LB[Load Balancer L7]
LB --> API1[API Server 1]
LB --> API2[API Server 2]
LB --> API3[API Server 3]
API1 --> Cache[(Redis Cluster)]
API1 --> MQ[Kafka]
API1 --> DB_Master[(DB Master)]
DB_Master --> DB_Slave1[(DB Slave 1)]
DB_Master --> DB_Slave2[(DB Slave 2)]
MQ --> Worker1[Worker 1]
MQ --> Worker2[Worker 2]
Worker1 --> DB_Master
Worker1 --> Search[(Elasticsearch)]
Worker1 --> Object[(Object Storage S3)]
7. Design Principles Summary
| Principle | Description |
|---|---|
| Stateless services | Externalize state to cache/database for easy horizontal scaling |
| Read-write separation | Write to primary, read from replicas |
| Cache first | Put hot data in cache to reduce DB load |
| Async processing | Use message queues for non-critical paths |
| Horizontal scaling | Add machines rather than upgrading machines |
| Graceful degradation | Core functionality continues when non-core features are unavailable |
| Defensive design | Timeouts, retries, circuit breakers, rate limiting |
| Observability | Logging + Metrics + Tracing |
Relations to Other Topics
- See Distributed Systems for how consistency, messaging, and fault tolerance shape architecture
- See Database Systems for caching, indexing, read/write splitting, and data-model choice
- See Cloud Services for the infrastructure layer behind load balancing, object storage, and elastic resources
- See Full-Stack Development for how product requirements ultimately land in APIs, frontend interactions, and data flow
References
- "Designing Data-Intensive Applications" - Martin Kleppmann
- "System Design Interview" - Alex Xu
- "Building Microservices" - Sam Newman
- The System Design Primer: https://github.com/donnemartin/system-design-primer