Skip to content

TCP/IP Protocol Stack

1. Five-Layer Model

┌──────────────────────────────────────────┐
│  Application Layer                        │  HTTP, DNS, SMTP, FTP, SSH
├──────────────────────────────────────────┤
│  Transport Layer                          │  TCP, UDP
├──────────────────────────────────────────┤
│  Network Layer                            │  IP, ICMP, ARP
├──────────────────────────────────────────┤
│  Data Link Layer                          │  Ethernet, Wi-Fi, PPP
├──────────────────────────────────────────┤
│  Physical Layer                           │  Electrical, optical, radio signals
└──────────────────────────────────────────┘
Layer PDU Name Core Function Key Protocols
Application Message Application functionality HTTP, DNS, SMTP
Transport Segment End-to-end reliable transport / multiplexing TCP, UDP
Network Packet Routing, addressing IP, ICMP
Data Link Frame Transmission between adjacent nodes Ethernet, 802.11
Physical Bit Physical medium transmission RJ45, fiber optic

Encapsulation process:

\[ \text{Data} \xrightarrow{+\text{TCP/UDP header}} \text{Segment} \xrightarrow{+\text{IP header}} \text{Packet} \xrightarrow{+\text{Frame header+trailer}} \text{Frame} \xrightarrow{} \text{Bits} \]

2. Network Layer: IP Protocol

2.1 IPv4 Addresses

32-bit addresses, typically written in dotted decimal notation (e.g., 192.168.1.1).

CIDR (Classless Inter-Domain Routing)

\[ \text{IP address} = \underbrace{\text{Network prefix}}_{n \text{ bits}} + \underbrace{\text{Host portion}}_{32-n \text{ bits}} \]

Notation: 192.168.1.0/24 (first 24 bits are the network address, remaining 8 bits are host addresses).

Subnetting

Dividing a large network into multiple subnets:

Network CIDR Subnet Mask Available Hosts
10.0.0.0/8 Class A (private) 255.0.0.0 \(2^{24} - 2\)
172.16.0.0/12 Class B (private) 255.240.0.0 \(2^{20} - 2\)
192.168.0.0/16 Class C (private) 255.255.0.0 \(2^{16} - 2\)
\[ \text{Available hosts} = 2^{32-n} - 2 \quad (\text{subtract network and broadcast addresses}) \]

2.2 IPv6

128-bit addresses, represented as 8 groups of hexadecimal separated by colons: 2001:0db8:85a3::8a2e:0370:7334

  • Address space: \(2^{128} \approx 3.4 \times 10^{38}\)
  • No NAT needed, built-in IPSec, simplified header
  • Gradual adoption via IPv4/IPv6 dual-stack transition

2.3 NAT (Network Address Translation)

Translates private IP addresses to public IP addresses:

Internal: 192.168.1.100:5000  ──NAT──>  External: 203.0.113.1:12345
Internal: 192.168.1.101:5000  ──NAT──>  External: 203.0.113.1:12346
  • Advantages: Conserves public IPs, hides internal network structure
  • Disadvantages: Breaks end-to-end transparency, P2P communication is difficult (requires NAT traversal techniques such as STUN/TURN)

2.4 DHCP (Dynamic Host Configuration Protocol)

Protocol for automatic IP address assignment:

  1. Discover: Client broadcasts to discover DHCP servers
  2. Offer: Server offers an available IP address
  3. Request: Client requests to use that address
  4. Acknowledge: Server confirms

Also assigns: IP address, subnet mask, default gateway, DNS server.

2.5 ICMP

Internet Control Message Protocol -- network diagnostics and error reporting:

Type Purpose
Echo Request/Reply ping command
Destination Unreachable Target unreachable
Time Exceeded TTL expired (traceroute exploits this)
Redirect Notify host of a better route

3. Transport Layer: TCP

3.1 TCP Features

  • Connection-oriented: Connection must be established before communication
  • Reliable delivery: Acknowledgments, retransmissions, and sequence numbers ensure data integrity and ordering
  • Flow control: Receiver controls sending rate
  • Congestion control: Detects network congestion and adjusts sending rate
  • Full duplex: Simultaneous bidirectional transmission
  • Byte stream: No message boundaries

3.2 Three-Way Handshake (Connection Establishment)

sequenceDiagram
    participant C as Client
    participant S as Server

    C->>S: SYN, seq=x
    Note right of S: SYN_RCVD
    S->>C: SYN+ACK, seq=y, ack=x+1
    Note left of C: ESTABLISHED
    C->>S: ACK, seq=x+1, ack=y+1
    Note right of S: ESTABLISHED

Why three times? To prevent stale duplicate connection requests (old SYNs) from erroneously establishing connections.

3.3 Four-Way Handshake (Connection Termination)

sequenceDiagram
    participant C as Client
    participant S as Server

    C->>S: FIN, seq=u
    Note right of S: CLOSE_WAIT
    S->>C: ACK, ack=u+1
    Note left of C: FIN_WAIT_2
    Note right of S: May still have data to send...
    S->>C: FIN, seq=v
    Note left of C: TIME_WAIT
    C->>S: ACK, ack=v+1
    Note right of S: CLOSED
    Note left of C: CLOSED after 2MSL wait

TIME_WAIT state: Lasts 2x MSL (Maximum Segment Lifetime, typically 60 seconds), ensuring the last ACK reaches the peer and that old segments expire.

3.4 TCP State Machine

stateDiagram-v2
    [*] --> CLOSED
    CLOSED --> LISTEN: Passive open
    CLOSED --> SYN_SENT: Active open, send SYN
    LISTEN --> SYN_RCVD: Receive SYN, send SYN+ACK
    SYN_SENT --> ESTABLISHED: Receive SYN+ACK, send ACK
    SYN_RCVD --> ESTABLISHED: Receive ACK
    ESTABLISHED --> FIN_WAIT_1: Active close, send FIN
    ESTABLISHED --> CLOSE_WAIT: Receive FIN, send ACK
    FIN_WAIT_1 --> FIN_WAIT_2: Receive ACK
    FIN_WAIT_1 --> CLOSING: Receive FIN, send ACK
    FIN_WAIT_2 --> TIME_WAIT: Receive FIN, send ACK
    CLOSING --> TIME_WAIT: Receive ACK
    CLOSE_WAIT --> LAST_ACK: Send FIN
    LAST_ACK --> CLOSED: Receive ACK
    TIME_WAIT --> CLOSED: 2MSL timeout

3.5 Flow Control

Sliding Window mechanism:

  • The receiver communicates rwnd (Receive Window) to tell the sender how much data it can still accept
  • Sender's send window <= rwnd
  • Prevents the sender from overwhelming the receiver's buffer
\[ \text{Send window} = \min(\text{rwnd}, \text{cwnd}) \]

where cwnd is the congestion window (determined by congestion control).

3.6 Congestion Control

TCP adjusts cwnd (Congestion Window) to probe network capacity.

Slow Start

  • Initial cwnd = 1 MSS (or 10 MSS, Linux default)
  • For each ACK received, cwnd += 1 MSS -> exponential growth
  • Growth until ssthresh (slow start threshold) is reached, then switches to congestion avoidance
\[ \text{cwnd}_{n+1} = \text{cwnd}_n \times 2 \quad (\text{doubles every RTT}) \]

Congestion Avoidance (AIMD: Additive Increase Multiplicative Decrease)

  • Every RTT: cwnd += 1 MSS (additive increase)
  • Packet loss detected: cwnd = cwnd / 2 (multiplicative decrease)

Fast Retransmit

Upon receiving 3 duplicate ACKs -> immediately retransmit the lost segment (don't wait for timeout).

Fast Recovery

After fast retransmit:

  1. ssthresh = cwnd / 2
  2. cwnd = ssthresh + 3 MSS
  3. Enter congestion avoidance (not slow start)
cwnd
 ^
 |        /\
 |       /  \        /\
 |      /    \      /  \     /
 |     /      \    /    \   /
 |    /        \  /      \ /
 |   / Slow     \/  Congestion
 |  / Start        Avoidance
 | / (exponential)  (linear)
 +──────────────────────────> Time
     ^         ^
     |         |
   ssthresh   Loss(cwnd/2)

Modern Congestion Control Algorithms

Algorithm Characteristics
Reno Classic AIMD + fast retransmit/recovery
Cubic Linux default; cwnd grows as a cubic function of time
BBR Developed by Google; based on bandwidth and delay estimation rather than loss
QUIC (HTTP/3) User-space congestion control, more flexible

4. Transport Layer: UDP

4.1 Features

  • Connectionless: No handshake required
  • Unreliable: No acknowledgments, retransmissions, or ordering
  • Low overhead: Header is only 8 bytes (vs TCP 20+ bytes)
  • Preserves message boundaries: Each send corresponds to one datagram

4.2 Use Cases

Scenario Reason
DNS queries Single request-response, low latency
Real-time audio/video Tolerates loss, cannot tolerate delay
Gaming Low latency prioritized over reliability
IoT sensors Resource-constrained devices
QUIC (HTTP/3) Builds reliable transport on top of UDP

4.3 UDP Header

 0      7 8     15 16    23 24    31
+--------+--------+--------+--------+
|   Source Port    |   Dest Port     |
+--------+--------+--------+--------+
|   Length         |   Checksum      |
+--------+--------+--------+--------+
|          Data ...                  |

5. Ports and Multiplexing

  • Well-known ports (0-1023): HTTP=80, HTTPS=443, DNS=53, SSH=22
  • Registered ports (1024-49151): Application-registered
  • Dynamic ports (49152-65535): Ephemeral client ports

Multiplexing/Demultiplexing: The transport layer uses the 5-tuple (source IP, source port, destination IP, destination port, protocol) to deliver data to the correct application process.

6. Network Address Resolution

ARP (Address Resolution Protocol)

Resolves IP addresses to MAC addresses (data link layer addresses):

  1. Send ARP broadcast request: "Who has IP 192.168.1.1?"
  2. Target host replies: "I am 192.168.1.1, MAC is xx:xx:xx:xx:xx:xx"
  3. Result is cached in the ARP table

7. Routing

7.1 Routing Tables

Each host/router maintains a routing table containing:

Field Meaning
Destination network Target IP subnet
Next hop Which router to forward to
Interface Which network interface to send from
Metric Route priority

Longest Prefix Match: When multiple routes match, select the one with the longest (most specific) prefix.

7.2 Routing Protocols

Protocol Type Algorithm Scope
RIP Distance vector Bellman-Ford Small networks
OSPF Link state Dijkstra Enterprise intranets (intra-AS)
BGP Path vector Policy routing Internet backbone (inter-AS)

8. Network Performance Metrics

Metric Definition Unit
Bandwidth Maximum link transmission rate bps
Throughput Actual transmission rate bps
Latency Total time for data from source to destination ms
RTT Round-trip time ms
Packet loss rate Lost packets / total packets %
\[ \text{Latency} = t_{\text{propagation}} + t_{\text{transmission}} + t_{\text{queuing}} + t_{\text{processing}} \]
\[ t_{\text{propagation}} = \frac{\text{distance}}{\text{speed of light in medium}}, \quad t_{\text{transmission}} = \frac{\text{packet size}}{\text{bandwidth}} \]

Bandwidth-Delay Product (BDP):

\[ \text{BDP} = \text{Bandwidth} \times \text{RTT} \]

BDP represents the maximum amount of data the network pipe can hold; the TCP window size should be >= BDP to fully utilize bandwidth.



评论 #