TCP/IP Protocol Stack
1. Five-Layer Model
┌──────────────────────────────────────────┐
│ Application Layer │ HTTP, DNS, SMTP, FTP, SSH
├──────────────────────────────────────────┤
│ Transport Layer │ TCP, UDP
├──────────────────────────────────────────┤
│ Network Layer │ IP, ICMP, ARP
├──────────────────────────────────────────┤
│ Data Link Layer │ Ethernet, Wi-Fi, PPP
├──────────────────────────────────────────┤
│ Physical Layer │ Electrical, optical, radio signals
└──────────────────────────────────────────┘
| Layer | PDU Name | Core Function | Key Protocols |
|---|---|---|---|
| Application | Message | Application functionality | HTTP, DNS, SMTP |
| Transport | Segment | End-to-end reliable transport / multiplexing | TCP, UDP |
| Network | Packet | Routing, addressing | IP, ICMP |
| Data Link | Frame | Transmission between adjacent nodes | Ethernet, 802.11 |
| Physical | Bit | Physical medium transmission | RJ45, fiber optic |
Encapsulation process:
2. Network Layer: IP Protocol
2.1 IPv4 Addresses
32-bit addresses, typically written in dotted decimal notation (e.g., 192.168.1.1).
CIDR (Classless Inter-Domain Routing)
Notation: 192.168.1.0/24 (first 24 bits are the network address, remaining 8 bits are host addresses).
Subnetting
Dividing a large network into multiple subnets:
| Network | CIDR | Subnet Mask | Available Hosts |
|---|---|---|---|
10.0.0.0/8 |
Class A (private) | 255.0.0.0 | \(2^{24} - 2\) |
172.16.0.0/12 |
Class B (private) | 255.240.0.0 | \(2^{20} - 2\) |
192.168.0.0/16 |
Class C (private) | 255.255.0.0 | \(2^{16} - 2\) |
2.2 IPv6
128-bit addresses, represented as 8 groups of hexadecimal separated by colons: 2001:0db8:85a3::8a2e:0370:7334
- Address space: \(2^{128} \approx 3.4 \times 10^{38}\)
- No NAT needed, built-in IPSec, simplified header
- Gradual adoption via IPv4/IPv6 dual-stack transition
2.3 NAT (Network Address Translation)
Translates private IP addresses to public IP addresses:
Internal: 192.168.1.100:5000 ──NAT──> External: 203.0.113.1:12345
Internal: 192.168.1.101:5000 ──NAT──> External: 203.0.113.1:12346
- Advantages: Conserves public IPs, hides internal network structure
- Disadvantages: Breaks end-to-end transparency, P2P communication is difficult (requires NAT traversal techniques such as STUN/TURN)
2.4 DHCP (Dynamic Host Configuration Protocol)
Protocol for automatic IP address assignment:
- Discover: Client broadcasts to discover DHCP servers
- Offer: Server offers an available IP address
- Request: Client requests to use that address
- Acknowledge: Server confirms
Also assigns: IP address, subnet mask, default gateway, DNS server.
2.5 ICMP
Internet Control Message Protocol -- network diagnostics and error reporting:
| Type | Purpose |
|---|---|
| Echo Request/Reply | ping command |
| Destination Unreachable | Target unreachable |
| Time Exceeded | TTL expired (traceroute exploits this) |
| Redirect | Notify host of a better route |
3. Transport Layer: TCP
3.1 TCP Features
- Connection-oriented: Connection must be established before communication
- Reliable delivery: Acknowledgments, retransmissions, and sequence numbers ensure data integrity and ordering
- Flow control: Receiver controls sending rate
- Congestion control: Detects network congestion and adjusts sending rate
- Full duplex: Simultaneous bidirectional transmission
- Byte stream: No message boundaries
3.2 Three-Way Handshake (Connection Establishment)
sequenceDiagram
participant C as Client
participant S as Server
C->>S: SYN, seq=x
Note right of S: SYN_RCVD
S->>C: SYN+ACK, seq=y, ack=x+1
Note left of C: ESTABLISHED
C->>S: ACK, seq=x+1, ack=y+1
Note right of S: ESTABLISHED
Why three times? To prevent stale duplicate connection requests (old SYNs) from erroneously establishing connections.
3.3 Four-Way Handshake (Connection Termination)
sequenceDiagram
participant C as Client
participant S as Server
C->>S: FIN, seq=u
Note right of S: CLOSE_WAIT
S->>C: ACK, ack=u+1
Note left of C: FIN_WAIT_2
Note right of S: May still have data to send...
S->>C: FIN, seq=v
Note left of C: TIME_WAIT
C->>S: ACK, ack=v+1
Note right of S: CLOSED
Note left of C: CLOSED after 2MSL wait
TIME_WAIT state: Lasts 2x MSL (Maximum Segment Lifetime, typically 60 seconds), ensuring the last ACK reaches the peer and that old segments expire.
3.4 TCP State Machine
stateDiagram-v2
[*] --> CLOSED
CLOSED --> LISTEN: Passive open
CLOSED --> SYN_SENT: Active open, send SYN
LISTEN --> SYN_RCVD: Receive SYN, send SYN+ACK
SYN_SENT --> ESTABLISHED: Receive SYN+ACK, send ACK
SYN_RCVD --> ESTABLISHED: Receive ACK
ESTABLISHED --> FIN_WAIT_1: Active close, send FIN
ESTABLISHED --> CLOSE_WAIT: Receive FIN, send ACK
FIN_WAIT_1 --> FIN_WAIT_2: Receive ACK
FIN_WAIT_1 --> CLOSING: Receive FIN, send ACK
FIN_WAIT_2 --> TIME_WAIT: Receive FIN, send ACK
CLOSING --> TIME_WAIT: Receive ACK
CLOSE_WAIT --> LAST_ACK: Send FIN
LAST_ACK --> CLOSED: Receive ACK
TIME_WAIT --> CLOSED: 2MSL timeout
3.5 Flow Control
Sliding Window mechanism:
- The receiver communicates rwnd (Receive Window) to tell the sender how much data it can still accept
- Sender's send window <= rwnd
- Prevents the sender from overwhelming the receiver's buffer
where cwnd is the congestion window (determined by congestion control).
3.6 Congestion Control
TCP adjusts cwnd (Congestion Window) to probe network capacity.
Slow Start
- Initial cwnd = 1 MSS (or 10 MSS, Linux default)
- For each ACK received, cwnd += 1 MSS -> exponential growth
- Growth until ssthresh (slow start threshold) is reached, then switches to congestion avoidance
Congestion Avoidance (AIMD: Additive Increase Multiplicative Decrease)
- Every RTT: cwnd += 1 MSS (additive increase)
- Packet loss detected: cwnd = cwnd / 2 (multiplicative decrease)
Fast Retransmit
Upon receiving 3 duplicate ACKs -> immediately retransmit the lost segment (don't wait for timeout).
Fast Recovery
After fast retransmit:
- ssthresh = cwnd / 2
- cwnd = ssthresh + 3 MSS
- Enter congestion avoidance (not slow start)
cwnd
^
| /\
| / \ /\
| / \ / \ /
| / \ / \ /
| / \ / \ /
| / Slow \/ Congestion
| / Start Avoidance
| / (exponential) (linear)
+──────────────────────────> Time
^ ^
| |
ssthresh Loss(cwnd/2)
Modern Congestion Control Algorithms
| Algorithm | Characteristics |
|---|---|
| Reno | Classic AIMD + fast retransmit/recovery |
| Cubic | Linux default; cwnd grows as a cubic function of time |
| BBR | Developed by Google; based on bandwidth and delay estimation rather than loss |
| QUIC (HTTP/3) | User-space congestion control, more flexible |
4. Transport Layer: UDP
4.1 Features
- Connectionless: No handshake required
- Unreliable: No acknowledgments, retransmissions, or ordering
- Low overhead: Header is only 8 bytes (vs TCP 20+ bytes)
- Preserves message boundaries: Each send corresponds to one datagram
4.2 Use Cases
| Scenario | Reason |
|---|---|
| DNS queries | Single request-response, low latency |
| Real-time audio/video | Tolerates loss, cannot tolerate delay |
| Gaming | Low latency prioritized over reliability |
| IoT sensors | Resource-constrained devices |
| QUIC (HTTP/3) | Builds reliable transport on top of UDP |
4.3 UDP Header
0 7 8 15 16 23 24 31
+--------+--------+--------+--------+
| Source Port | Dest Port |
+--------+--------+--------+--------+
| Length | Checksum |
+--------+--------+--------+--------+
| Data ... |
5. Ports and Multiplexing
- Well-known ports (0-1023): HTTP=80, HTTPS=443, DNS=53, SSH=22
- Registered ports (1024-49151): Application-registered
- Dynamic ports (49152-65535): Ephemeral client ports
Multiplexing/Demultiplexing: The transport layer uses the 5-tuple (source IP, source port, destination IP, destination port, protocol) to deliver data to the correct application process.
6. Network Address Resolution
ARP (Address Resolution Protocol)
Resolves IP addresses to MAC addresses (data link layer addresses):
- Send ARP broadcast request: "Who has IP 192.168.1.1?"
- Target host replies: "I am 192.168.1.1, MAC is xx:xx:xx:xx:xx:xx"
- Result is cached in the ARP table
7. Routing
7.1 Routing Tables
Each host/router maintains a routing table containing:
| Field | Meaning |
|---|---|
| Destination network | Target IP subnet |
| Next hop | Which router to forward to |
| Interface | Which network interface to send from |
| Metric | Route priority |
Longest Prefix Match: When multiple routes match, select the one with the longest (most specific) prefix.
7.2 Routing Protocols
| Protocol | Type | Algorithm | Scope |
|---|---|---|---|
| RIP | Distance vector | Bellman-Ford | Small networks |
| OSPF | Link state | Dijkstra | Enterprise intranets (intra-AS) |
| BGP | Path vector | Policy routing | Internet backbone (inter-AS) |
8. Network Performance Metrics
| Metric | Definition | Unit |
|---|---|---|
| Bandwidth | Maximum link transmission rate | bps |
| Throughput | Actual transmission rate | bps |
| Latency | Total time for data from source to destination | ms |
| RTT | Round-trip time | ms |
| Packet loss rate | Lost packets / total packets | % |
Bandwidth-Delay Product (BDP):
BDP represents the maximum amount of data the network pipe can hold; the TCP window size should be >= BDP to fully utilize bandwidth.
Navigation
- Next: Application Layer Protocols
- Related: Computer Networks