Distributed Systems

Introduction

A distributed system consists of multiple independent computing nodes collaborating over a network. This article covers core theory (CAP theorem, consistency models, consensus algorithms) and engineering practices (microservices, message queues, distributed transactions).

1. CAP Theorem

The CAP theorem states that a distributed system facing network partitions cannot simultaneously satisfy all three properties:

Property	Meaning
C (Consistency)	All nodes see the same data
A (Availability)	Every request receives a (non-error) response
P (Partition Tolerance)	The system continues to operate despite network partitions

Since network partitions are unavoidable in practice, the real choice is between CP and AP:

Choice	Meaning	Representative Systems
CP	Guarantees consistency, sacrifices availability	ZooKeeper, etcd, HBase
AP	Guarantees availability, sacrifices consistency	Cassandra, DynamoDB, CouchDB

PACELC Extension

The PACELC theorem further states: even without partitions (E=Else), there is still a trade-off between Latency and Consistency.

2. Consistency Models

2.1 Strong Consistency (Linearizability)

All operations appear to execute in a global total order, and reads always return the most recent write value.

\[ \text{write}(x, v) \prec \text{read}(x) \Rightarrow \text{read}(x) = v \]

Implementation: single leader + synchronous replication
Cost: high latency
Representative: Spanner (TrueTime)

2.2 Eventual Consistency

If no new writes occur, all replicas will eventually converge to the same value.

\[ \lim_{t \to \infty} \text{replicas}(x, t) \to \text{same value} \]

Stale reads possible during the convergence window
Representatives: DNS, Amazon S3, Cassandra

2.3 Causal Consistency

Guarantees that causally related operations are observed in causal order by all nodes, while concurrent operations may be arbitrarily ordered.

Process A: write(x=1) → write(y=2)
Process B: if it sees y=2, it must have already seen x=1

Implementation: vector clocks
Weaker than strong consistency, stronger than eventual consistency

3. Consensus Algorithms

The consensus problem: multiple nodes agree on a value.

3.1 Paxos

Paxos is the most classic consensus algorithm, involving three roles:

Role	Responsibility
Proposer	Proposes a value
Acceptor	Accepts/rejects proposals
Learner	Learns the chosen value

Two-phase process:

Phase 1 (Prepare):
  Proposer → Acceptor: Prepare(n)
  Acceptor → Proposer: Promise(n, accepted_value)
  (Acceptor promises not to accept proposals with numbers less than n)

Phase 2 (Accept):
  Proposer → Acceptor: Accept(n, value)
  Acceptor → Proposer: Accepted(n, value)
  (Majority of Acceptors accept → value is chosen)

3.2 Raft

Raft is a simplification of Paxos, designed to be easier to understand and implement.

Leader Election:

                    Timeout
Follower ───────────────→ Candidate
    ↑                        │
    │ Discovers new Leader   │ Wins majority
    │                        ▼
    └──────────────────── Leader
                    ↑        │
                    │ Discovers higher term
                    └────────┘ → Follower

Election rules:

Follower times out without receiving a heartbeat → becomes Candidate
Candidate increments its term number, requests votes from all nodes
Wins majority of votes → becomes Leader
At most one Leader per term

Log Replication:

Client → Leader: write request
Leader: append to local log
Leader → Followers: AppendEntries RPC
Followers: append to local log, reply with acknowledgment
Leader: receives majority acknowledgment → commit
Leader → Client: return success
Leader → Followers: notify of commit

Consistency guarantees:

If two logs have the same term number at a given index, all entries before that index are identical
A Leader never overwrites its own log
Only a Candidate whose log is at least as up-to-date as the majority can win the election

4. Microservice Architecture

4.1 Service Discovery

┌──────────────┐
│ Service Registry │ ← Service registration center
│ (Consul/etcd)    │
└──────┬───────┘
       │ Register/Heartbeat/Query
  ┌────┼────┐
  ▼    ▼    ▼
 [A]  [B]  [C]  ← Microservice instances

Approach	Implementation	Representative
Client-side discovery	Client queries registry directly	Netflix Eureka
Server-side discovery	Load balancer queries registry	AWS ALB, Consul
DNS discovery	Resolves service addresses via DNS	Kubernetes Service

4.2 API Gateway

The API gateway is the unified entry point for microservices:

Client → [API Gateway] → Service A
                       → Service B
                       → Service C

Functions:
- Routing: routes to different services based on URL/Header
- Authentication: unified JWT/OAuth verification
- Rate limiting
- Circuit breaking
- Logging: request tracing

Representative tools: Kong, Nginx, AWS API Gateway, Envoy.

4.3 Circuit Breaker Pattern

Prevents cascading failures:

State machine:
         Failure rate exceeds threshold
Closed ──────────────→ Open
  ↑                      │
  │ Probe succeeds      │ Timeout
  │                      ▼
  └──────────────── Half-Open
       Probe fails → Open

class CircuitBreaker:
    CLOSED, OPEN, HALF_OPEN = 0, 1, 2

    def __init__(self, failure_threshold=5, timeout=30):
        self.state = self.CLOSED
        self.failures = 0
        self.threshold = failure_threshold
        self.timeout = timeout
        self.last_failure_time = None

    def call(self, func, *args):
        if self.state == self.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = self.HALF_OPEN
            else:
                raise CircuitBreakerOpen()

        try:
            result = func(*args)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

5. Message Queues

5.1 Kafka

Apache Kafka is a high-throughput, durable distributed message streaming platform.

Core concepts:

Concept	Description
Topic	Message category
Partition	Partition of a Topic (unit of parallelism)
Producer	Message producer
Consumer	Message consumer
Consumer Group	Consumer group (a partition is consumed by only one consumer in the group)
Offset	Position of a message within a partition
Broker	Kafka server node

Producer → Topic [P0|P1|P2] → Consumer Group
                                ├── Consumer 1 ← P0
                                ├── Consumer 2 ← P1
                                └── Consumer 3 ← P2

Characteristics:

Messages persisted to disk (sequential I/O, high throughput)
Supports message replay (re-consuming historical messages)
Partitions enable horizontal scaling

5.2 RabbitMQ

A traditional message queue based on the AMQP protocol.

Producer → Exchange → Queue → Consumer
           (routing rules)

Exchange types:
- Direct:  exact routing key match
- Fanout:  broadcast to all bound queues
- Topic:   wildcard routing key match
- Headers: match based on message headers

5.3 Kafka vs RabbitMQ

Dimension	Kafka	RabbitMQ
Model	Publish-subscribe (log)	Message queue
Throughput	Millions/sec	Tens of thousands/sec
Message retention	Persistent, replayable	Deleted after consumption
Order guarantee	Ordered within partition	Ordered within queue
Use cases	Event streaming, logs, big data	Task queues, RPC

6. Distributed Transactions

6.1 Two-Phase Commit (2PC)

Coordinator              Participant A    Participant B
  │── Prepare ──────→   │          │
  │                     │── OK ──→ │
  │← OK ───────────     │          │
  │                     │          │
  │── Commit ───────→   │          │
  │                     │── ACK ─→ │
  │← ACK ──────────     │          │

Problems:

Blocking: participants wait for the coordinator's decision
Single point of failure: if the coordinator crashes, participants block
Data inconsistency: network partitions can cause inconsistency

6.2 Saga Pattern

Splits a large transaction into a series of local transactions, each with a corresponding compensating action.

T1 → T2 → T3 → ... → Tn
                 ↓ failure
C3 ← C2 ← C1 ← ...   (compensation)

Example: order flow

1. Create order (T1)          Compensate: cancel order (C1)
2. Deduct inventory (T2)      Compensate: restore inventory (C2)  
3. Charge payment (T3)        Compensate: refund (C3)
4. Send notification (T4)     Compensate: send cancellation notice (C4)

If T3 fails: execute C2 → C1

Two orchestration approaches:

Approach	Implementation	Advantages	Disadvantages
Choreography	Event-driven, services communicate directly	Decoupled	Hard to trace
Orchestration	Central orchestrator controls the flow	Clear	Orchestrator complexity

7. Distributed System Design Principles

Principle	Meaning
Idempotency	Executing the same operation multiple times has the same effect as once
Timeouts and retries	Set reasonable timeouts with exponential backoff retries
Graceful degradation	Core functionality continues when some features are unavailable
Observability	Logging, Metrics, Tracing
Eventual consistency	Accept short-term inconsistency, guarantee eventual convergence

Relations to Other Topics

See System Design for how service decomposition, caching, messaging, and capacity planning connect to distributed scenarios
See Database Systems for the connection between replication, transactions, consistency, and storage design
See Cloud Services for the infrastructure layer that supports elasticity and resilience
See Computer Networks for how latency, protocol behavior, and transport reliability shape distributed systems

References

"Designing Data-Intensive Applications" - Martin Kleppmann
"Distributed Systems" - Maarten van Steen & Andrew Tanenbaum
Raft paper: "In Search of an Understandable Consensus Algorithm"
Kafka Official Documentation