Skip to content

System Design

Introduction

System design is the process of translating business requirements into scalable, highly available technical architectures. This article introduces system design methodology, core components, and classic case studies.


1. System Design Methodology

1.1 Four-Step Approach

Step 1: Clarify Requirements (5 min)
  - Functional requirements: what are the core features?
  - Non-functional requirements: QPS, latency, availability, consistency
  - Constraints: user scale, data volume, budget

Step 2: Estimation (5 min)
  - QPS, storage, bandwidth
  - Peak vs average

Step 3: High-Level Design (15 min)
  - Core components and data flow
  - API design
  - Data model

Step 4: Deep Dive (20 min)
  - Choose 1-2 core components to explore in depth
  - Trade-offs
  - Scalability and fault tolerance

1.2 Load Estimation

Common benchmark numbers:

Metric Order of Magnitude
1M DAU QPS ≈ 12 (average), peak ≈ 24-60
One tweet ~250 bytes (text)
One image ~200 KB (compressed)
One video ~5 MB/min (compressed)
SSD read ~100 μs
Memory read ~100 ns
Network round trip (same region) ~0.5 ms
Network round trip (cross-continent) ~150 ms

2. Core Components

2.1 Load Balancer

graph LR
    C[Client] --> LB[Load Balancer]
    LB --> S1[Server 1]
    LB --> S2[Server 2]
    LB --> S3[Server 3]

L4 vs L7 Load Balancing:

Layer Operating Layer Decision Basis Performance Representatives
L4 Transport IP + Port High LVS, AWS NLB
L7 Application URL, Header, Cookie Fairly high Nginx, HAProxy, AWS ALB

Load balancing strategies:

Strategy Description Use Case
Round Robin Assign in turn Servers with similar performance
Weighted Round Robin Assign by weight Servers with different performance
Least Connections Assign to least connected Long connection scenarios
IP Hash Same IP to same server Session persistence
Consistent Hashing Hash ring Caching scenarios

2.2 Reverse Proxy

Client → [Reverse Proxy Nginx] → Backend Server Cluster

Functions:
- Load balancing
- SSL termination
- Static file caching
- Compression (gzip/brotli)
- Rate limiting
- Security (WAF)

2.3 Caching

Cache hierarchy:

Client cache (browser)
    ↓ miss
CDN cache (edge nodes)
    ↓ miss
Reverse proxy cache (Nginx)
    ↓ miss
Application cache (Redis/Memcached)
    ↓ miss
Database

Common Redis use cases:

import redis

r = redis.Redis(host='localhost', port=6379)

# Cache strategy: Cache-Aside
def get_user(user_id):
    # 1. Check cache first
    cached = r.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # 2. Cache miss, query database
    user = db.query("SELECT * FROM users WHERE id = %s", user_id)

    # 3. Write to cache (TTL 5 minutes)
    r.setex(f"user:{user_id}", 300, json.dumps(user))
    return user

Cache problems:

Problem Description Solution
Cache penetration Querying non-existent keys Bloom filter / cache null values
Cache breakdown Hot key expires, causing a flood of requests Mutex lock / never expire
Cache avalanche Many keys expire simultaneously Random TTL / multi-level cache

CDN (Content Delivery Network):

  • Caches static resources at global edge nodes
  • Users access the nearest node
  • Reduces latency and origin server load
  • Representatives: Cloudflare, AWS CloudFront, Akamai

2.4 Message Queue

Asynchronous decoupling and peak shaving:

Synchronous call (coupled):
  User → Order Service → Inventory Service → Payment Service → Notification Service

Async messaging (decoupled):
  User → Order Service → [Message Queue]
                           ├→ Inventory Service
                           ├→ Payment Service
                           └→ Notification Service

2.5 Database Sharding

Vertical sharding: split tables into different databases by business domain

User DB: users, profiles
Order DB: orders, order_items
Product DB: products, categories

Horizontal sharding: split the same table by data range

users table sharded by user_id:
  Shard 0: user_id 0 - 999,999
  Shard 1: user_id 1,000,000 - 1,999,999
  Shard 2: user_id 2,000,000 - 2,999,999

Sharding strategies:

Strategy Method Advantages Disadvantages
Range sharding By ID range Simple, range-query friendly Hot spot issues
Hash sharding hash(key) % N Uniform distribution Range queries are difficult
Consistent hashing Hash ring Minimal impact during scaling More complex implementation

2.6 Database Replication

Write requests → [Primary (Master)]
                      │ Sync/Async replication
                 ┌────┼────┐
                 ▼    ▼    ▼
               [Replica 1] [Replica 2] [Replica 3] ← Read requests

3. Consistent Hashing

Solves the problem of massive data migration when adding or removing nodes in distributed systems.

Hash ring (0 ~ 2^32-1):

        Node A
         /
    ----●----------
   /    key1        \
  /                  \
 |      key2    Node B|
  \        ●    ●    /
   \                /
    ----●----------
       Node C
       key3

Rule: a key is assigned to the first node found clockwise

Adding Node D: only need to migrate keys between Node D and its counterclockwise predecessor
Removing a Node: only need to migrate that node's keys to the next clockwise node

Virtual nodes: each physical node maps to multiple virtual nodes on the hash ring for more uniform data distribution.


4. Rate Limiting

4.1 Algorithms

Algorithm Principle Characteristics
Token bucket Tokens generated at fixed rate; requests consume tokens Allows bursts, average rate limiting
Leaky bucket Requests enter bucket; drain at fixed rate Strict rate limiting, smooth output
Fixed window Count within fixed time window Simple, boundary burst issue
Sliding window Count within sliding time window Precise, higher memory overhead

4.2 Implementation Example

# Redis sliding window rate limiting
import time

def is_rate_limited(redis_client, user_id, limit=100, window=60):
    key = f"rate_limit:{user_id}"
    now = time.time()

    pipe = redis_client.pipeline()
    pipe.zremrangebyscore(key, 0, now - window)  # remove expired entries
    pipe.zadd(key, {str(now): now})              # add current request
    pipe.zcard(key)                               # count
    pipe.expire(key, window)                      # set expiration
    results = pipe.execute()

    count = results[2]
    return count > limit

5. Case Studies

5.1 URL Shortener Service

Requirements: map long URLs to short URLs with redirect support.

Write: POST /api/shorten {url: "https://very-long-url.com/path"}
      → {short_url: "https://short.ly/abc123"}

Read:  GET /abc123
      → 301 Redirect to https://very-long-url.com/path

Estimates:
- Write QPS: 100/s
- Read QPS: 10,000/s (100:1 read-to-write ratio)
- Storage: 100 * 86400 * 365 * 5 years * 500B ≈ 800GB
graph LR
    C[Client] --> LB[Load Balancer]
    LB --> API[API Service]
    API --> Cache[(Redis Cache)]
    API --> DB[(Database)]

    subgraph Short Code Generation
        API --> ID[ID Generator]
        ID --> B62[Base62 Encoding]
    end

Core design:

  • Short code generation: distributed ID generator (Snowflake) → Base62 encoding
  • Read optimization: Redis cache for popular URLs (LRU)
  • 301 vs 302: 301 permanent redirect (browser caches) vs 302 temporary redirect (enables click tracking)

5.2 Chat System

Requirements: support one-to-one and group chat with delivery notifications.

graph TB
    C1[Client A] -->|WebSocket| GW[WebSocket Gateway]
    C2[Client B] -->|WebSocket| GW
    GW --> MS[Message Service]
    MS --> MQ[Message Queue Kafka]
    MQ --> PS[Push Service]
    MQ --> Store[Storage Service]
    PS --> GW
    Store --> DB[(Message Database)]

    MS --> Session[(Session Management Redis)]

Core design:

  • Persistent connections: WebSocket for real-time communication
  • Message storage: write fan-out (group chat writes to each member's inbox) vs read fan-out
  • Offline messages: pull unread messages when user comes online
  • Message ordering: monotonically increasing sequence_id per conversation

5.3 News Feed

Requirements: users publish content that appears in followers' feeds.

Push model vs pull model:

Model On Write On Read Suited For
Push (Fan-out on write) Write to all followers' feeds Read directly Few followers
Pull (Fan-in on read) Write only to own timeline Merge all followed users' posts Many followers (celebrities)
Hybrid Push for regular users, pull for celebrities Merge pushed + real-time pulled Production systems

6. System Architecture Overview

graph TB
    Client[Client] --> CDN[CDN]
    Client --> DNS[DNS]
    CDN --> LB[Load Balancer L7]
    LB --> API1[API Server 1]
    LB --> API2[API Server 2]
    LB --> API3[API Server 3]

    API1 --> Cache[(Redis Cluster)]
    API1 --> MQ[Kafka]
    API1 --> DB_Master[(DB Master)]

    DB_Master --> DB_Slave1[(DB Slave 1)]
    DB_Master --> DB_Slave2[(DB Slave 2)]

    MQ --> Worker1[Worker 1]
    MQ --> Worker2[Worker 2]

    Worker1 --> DB_Master
    Worker1 --> Search[(Elasticsearch)]
    Worker1 --> Object[(Object Storage S3)]

7. Design Principles Summary

Principle Description
Stateless services Externalize state to cache/database for easy horizontal scaling
Read-write separation Write to primary, read from replicas
Cache first Put hot data in cache to reduce DB load
Async processing Use message queues for non-critical paths
Horizontal scaling Add machines rather than upgrading machines
Graceful degradation Core functionality continues when non-core features are unavailable
Defensive design Timeouts, retries, circuit breakers, rate limiting
Observability Logging + Metrics + Tracing

Relations to Other Topics

  • See Distributed Systems for how consistency, messaging, and fault tolerance shape architecture
  • See Database Systems for caching, indexing, read/write splitting, and data-model choice
  • See Cloud Services for the infrastructure layer behind load balancing, object storage, and elastic resources
  • See Full-Stack Development for how product requirements ultimately land in APIs, frontend interactions, and data flow

References

  • "Designing Data-Intensive Applications" - Martin Kleppmann
  • "System Design Interview" - Alex Xu
  • "Building Microservices" - Sam Newman
  • The System Design Primer: https://github.com/donnemartin/system-design-primer

评论 #