System Design

Introduction

System design is the process of translating business requirements into scalable, highly available technical architectures. This article introduces system design methodology, core components, and classic case studies.

1. System Design Methodology

1.1 Four-Step Approach

Step 1: Clarify Requirements (5 min)
  - Functional requirements: what are the core features?
  - Non-functional requirements: QPS, latency, availability, consistency
  - Constraints: user scale, data volume, budget

Step 2: Estimation (5 min)
  - QPS, storage, bandwidth
  - Peak vs average

Step 3: High-Level Design (15 min)
  - Core components and data flow
  - API design
  - Data model

Step 4: Deep Dive (20 min)
  - Choose 1-2 core components to explore in depth
  - Trade-offs
  - Scalability and fault tolerance

1.2 Load Estimation

Common benchmark numbers:

Metric	Order of Magnitude
1M DAU	QPS ≈ 12 (average), peak ≈ 24-60
One tweet	~250 bytes (text)
One image	~200 KB (compressed)
One video	~5 MB/min (compressed)
SSD read	~100 μs
Memory read	~100 ns
Network round trip (same region)	~0.5 ms
Network round trip (cross-continent)	~150 ms

2. Core Components

2.1 Load Balancer

graph LR
    C[Client] --> LB[Load Balancer]
    LB --> S1[Server 1]
    LB --> S2[Server 2]
    LB --> S3[Server 3]

L4 vs L7 Load Balancing:

Layer	Operating Layer	Decision Basis	Performance	Representatives
L4	Transport	IP + Port	High	LVS, AWS NLB
L7	Application	URL, Header, Cookie	Fairly high	Nginx, HAProxy, AWS ALB

Load balancing strategies:

Strategy	Description	Use Case
Round Robin	Assign in turn	Servers with similar performance
Weighted Round Robin	Assign by weight	Servers with different performance
Least Connections	Assign to least connected	Long connection scenarios
IP Hash	Same IP to same server	Session persistence
Consistent Hashing	Hash ring	Caching scenarios

2.2 Reverse Proxy

Client → [Reverse Proxy Nginx] → Backend Server Cluster

Functions:
- Load balancing
- SSL termination
- Static file caching
- Compression (gzip/brotli)
- Rate limiting
- Security (WAF)

2.3 Caching

Cache hierarchy:

Client cache (browser)
    ↓ miss
CDN cache (edge nodes)
    ↓ miss
Reverse proxy cache (Nginx)
    ↓ miss
Application cache (Redis/Memcached)
    ↓ miss
Database

Common Redis use cases:

import redis

r = redis.Redis(host='localhost', port=6379)

# Cache strategy: Cache-Aside
def get_user(user_id):
    # 1. Check cache first
    cached = r.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # 2. Cache miss, query database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Write to cache (TTL 5 minutes)
    r.setex(f"user:{user_id}", 300, json.dumps(user))
    return user

Cache problems:

Problem	Description	Solution
Cache penetration	Querying non-existent keys	Bloom filter / cache null values
Cache breakdown	Hot key expires, causing a flood of requests	Mutex lock / never expire
Cache avalanche	Many keys expire simultaneously	Random TTL / multi-level cache

CDN (Content Delivery Network):

Caches static resources at global edge nodes
Users access the nearest node
Reduces latency and origin server load
Representatives: Cloudflare, AWS CloudFront, Akamai

2.4 Message Queue

Asynchronous decoupling and peak shaving:

Synchronous call (coupled):
  User → Order Service → Inventory Service → Payment Service → Notification Service

Async messaging (decoupled):
  User → Order Service → [Message Queue]
                           ├→ Inventory Service
                           ├→ Payment Service
                           └→ Notification Service

2.5 Database Sharding

Vertical sharding: split tables into different databases by business domain

User DB: users, profiles
Order DB: orders, order_items
Product DB: products, categories

Horizontal sharding: split the same table by data range

users table sharded by user_id:
  Shard 0: user_id 0 - 999,999
  Shard 1: user_id 1,000,000 - 1,999,999
  Shard 2: user_id 2,000,000 - 2,999,999

Sharding strategies:

Strategy	Method	Advantages	Disadvantages
Range sharding	By ID range	Simple, range-query friendly	Hot spot issues
Hash sharding	hash(key) % N	Uniform distribution	Range queries are difficult
Consistent hashing	Hash ring	Minimal impact during scaling	More complex implementation

2.6 Database Replication

Write requests → [Primary (Master)]
                      │ Sync/Async replication
                 ┌────┼────┐
                 ▼    ▼    ▼
               [Replica 1] [Replica 2] [Replica 3] ← Read requests

3. Consistent Hashing

Solves the problem of massive data migration when adding or removing nodes in distributed systems.

Hash ring (0 ~ 2^32-1):

        Node A
         /
    ----●----------
   /    key1        \
  /                  \
 |      key2    Node B|
  \        ●    ●    /
   \                /
    ----●----------
       Node C
       key3

Rule: a key is assigned to the first node found clockwise

Adding Node D: only need to migrate keys between Node D and its counterclockwise predecessor
Removing a Node: only need to migrate that node's keys to the next clockwise node

Virtual nodes: each physical node maps to multiple virtual nodes on the hash ring for more uniform data distribution.

4. Rate Limiting

4.1 Algorithms

Algorithm	Principle	Characteristics
Token bucket	Tokens generated at fixed rate; requests consume tokens	Allows bursts, average rate limiting
Leaky bucket	Requests enter bucket; drain at fixed rate	Strict rate limiting, smooth output
Fixed window	Count within fixed time window	Simple, boundary burst issue
Sliding window	Count within sliding time window	Precise, higher memory overhead

4.2 Implementation Example

# Redis sliding window rate limiting
import time

def is_rate_limited(redis_client, user_id, limit=100, window=60):
    key = f"rate_limit:{user_id}"
    now = time.time()

    pipe = redis_client.pipeline()
    pipe.zremrangebyscore(key, 0, now - window)  # remove expired entries
    pipe.zadd(key, {str(now): now})              # add current request
    pipe.zcard(key)                               # count
    pipe.expire(key, window)                      # set expiration
    results = pipe.execute()

    count = results[2]
    return count > limit

5. Case Studies

5.1 URL Shortener Service

Requirements: map long URLs to short URLs with redirect support.

Write: POST /api/shorten {url: "https://very-long-url.com/path"}
      → {short_url: "https://short.ly/abc123"}

Read:  GET /abc123
      → 301 Redirect to https://very-long-url.com/path

Estimates:
- Write QPS: 100/s
- Read QPS: 10,000/s (100:1 read-to-write ratio)
- Storage: 100 * 86400 * 365 * 5 years * 500B ≈ 800GB

graph LR
    C[Client] --> LB[Load Balancer]
    LB --> API[API Service]
    API --> Cache[(Redis Cache)]
    API --> DB[(Database)]

    subgraph Short Code Generation
        API --> ID[ID Generator]
        ID --> B62[Base62 Encoding]
    end

Core design:

Short code generation: distributed ID generator (Snowflake) → Base62 encoding
Read optimization: Redis cache for popular URLs (LRU)
301 vs 302: 301 permanent redirect (browser caches) vs 302 temporary redirect (enables click tracking)

5.2 Chat System

Requirements: support one-to-one and group chat with delivery notifications.

graph TB
    C1[Client A] -->|WebSocket| GW[WebSocket Gateway]
    C2[Client B] -->|WebSocket| GW
    GW --> MS[Message Service]
    MS --> MQ[Message Queue Kafka]
    MQ --> PS[Push Service]
    MQ --> Store[Storage Service]
    PS --> GW
    Store --> DB[(Message Database)]

    MS --> Session[(Session Management Redis)]