Skip to content

Memory Management

1. Address Spaces

1.1 Logical Address vs Physical Address

Concept Meaning Generated By
Logical address (virtual address) Address from the CPU/program's perspective CPU
Physical address Actual address in memory hardware After MMU translation

The MMU (Memory Management Unit) translates logical addresses to physical addresses at runtime, completely transparent to programs.

1.2 Process Address Space Layout

Typical Linux process virtual address space (64-bit):

High addr ┌─────────────────┐
          │  Kernel space    │  Inaccessible to user
          ├─────────────────┤  0xFFFF800000000000
          │  Stack ↓         │  Grows downward
          │                  │
          │  (free region)   │
          │                  │
          │  mmap region     │  Shared libraries, mmap files
          │                  │
          │  Heap ↑          │  Grows upward (malloc/new)
          ├─────────────────┤
          │  BSS (uninit.)   │
          │  Data (init.)    │
          │  Text (code)     │  Read-only + executable
Low addr  └─────────────────┘  0x0000000000400000

2. Paging

2.1 Basic Concepts

Divide both the virtual address space and physical memory into fixed-size pages / frames:

\[ \text{Virtual address} = \underbrace{\text{VPN (Virtual Page Number)}}_{p} \;\|\; \underbrace{\text{Offset}}_{d} \]
\[ \text{Physical address} = \underbrace{\text{PFN (Physical Frame Number)}}_{f} \;\|\; \underbrace{\text{Offset}}_{d} \]

The page table performs the VPN -> PFN mapping. The offset remains unchanged.

Common page sizes: 4 KB (\(2^{12}\) bytes); huge pages 2 MB / 1 GB.

2.2 Single-Level Page Table

Problem -- page table too large:

  • 32-bit address space + 4 KB pages -> \(2^{20}\) PTEs -> 4 MB per process at 4 bytes each
  • 64-bit address space -> unacceptable

2.3 Multi-Level Page Table

Allocate page table pages on demand; unused virtual address ranges do not have allocated page tables.

Taking x86-64 four-level page table (48-bit virtual address) as an example:

Virtual address (48 bits):
┌──────┬──────┬──────┬──────┬────────┐
│ PML4 │ PDPT │  PD  │  PT  │ Offset │
│ 9bit │ 9bit │ 9bit │ 9bit │ 12bit  │
└──┬───┴──┬───┴──┬───┴──┬───┴────────┘
   │      │      │      │
   ▼      ▼      ▼      ▼
 PML4 → PDPT → PD → PT → Physical frame

Advantages:

  • Space-efficient -- only allocates page table pages for used address regions
  • Actual 48-bit virtual address space, most processes use only a tiny fraction

Cost:

  • One address translation requires 4 memory accesses (one per level)
  • The importance of TLB becomes even more prominent

2.4 Inverted Page Table

  • Indexed by physical frame number (rather than virtual page number)
  • Only one table system-wide, size = number of physical frames
  • Lookup requires hashing or linear search
  • Used in IA-64, PowerPC

2.5 TLB (Translation Lookaside Buffer)

Hardware cache for page table translations (see Memory Hierarchy Design for details).

\[ \text{EAT} = h \cdot (t_{\text{TLB}} + t_{\text{mem}}) + (1-h) \cdot (t_{\text{TLB}} + k \cdot t_{\text{mem}} + t_{\text{mem}}) \]

where \(h\) is the TLB hit rate, \(k\) is the number of page table levels, and \(t_{\text{mem}}\) is the memory access time.

Numerical Example

TLB hit rate = 99%, TLB access = 1 ns, memory access = 100 ns, 4-level page table:

\[\text{EAT} = 0.99 \times (1 + 100) + 0.01 \times (1 + 4 \times 100 + 100)$$ $$= 0.99 \times 101 + 0.01 \times 501 = 99.99 + 5.01 = 105 \text{ ns}\]

Without TLB: each memory access would cost \(5 \times 100 = 500\) ns -- the TLB reduces access time by 4.7x.

3. Segmentation

Divides the address space into semantically meaningful segments (code, data, stack segments, etc.), each with an independent base address and length.

\[ \text{Physical address} = \text{Base}[\text{segment}] + \text{Offset} \]
Paging vs Segmentation Paging Segmentation
Unit size Fixed Variable
External fragmentation None Present
Internal fragmentation Present (last page) None
Programmer visibility Transparent Visible (logical division)

Modern approach: Segmentation + paging combined (x86 retains segmentation but sets segment bases to 0, effectively pure paging).

4. Page Replacement Algorithms

When physical memory is insufficient, pages must be swapped out to disk.

4.1 Optimal Algorithm (OPT / Belady's MIN)

Replace the page that will not be used for the longest time in the future.

  • Theoretically optimal; serves as performance upper bound for other algorithms
  • Not implementable (requires future knowledge)

4.2 FIFO

Replace the page that entered memory earliest.

Belady's Anomaly

FIFO can exhibit the counterintuitive phenomenon where increasing the number of page frames increases the page fault rate.

Example: Reference sequence 1,2,3,4,1,2,5,1,2,3,4,5 - 3 frames: 9 page faults - 4 frames: 10 page faults (more!)

4.3 LRU (Least Recently Used)

Replace the page that has not been used for the longest time.

  • Based on temporal locality: past access patterns predict the future
  • Exact implementation requires maintaining access timestamps or a stack -> high hardware overhead
  • A stack algorithm; does not suffer from Belady's anomaly

4.4 Clock Algorithm (Approximate LRU)

The most commonly used replacement algorithm in actual operating systems:

Circular linked list with a clock hand scanning pages:

  Check reference bit:
    1 -> Clear it, advance hand (give a second chance)
    0 -> Replace this page, advance hand

Enhanced Clock (considering dirty bit):

(reference, dirty) Priority Meaning
(0, 0) Best replacement Not used and not modified
(0, 1) Second best Not used but modified (needs write-back)
(1, 0) Worse Recently used but not modified
(1, 1) Worst Recently used and modified

4.5 Algorithm Comparison

Algorithm Fault Rate Overhead Belady's Anomaly Practicality
OPT Lowest -- No Not implementable
LRU Close to OPT High No Expensive to implement exactly
Clock Close to LRU Low No Most commonly used
FIFO Higher Lowest Yes Simple scenarios

5. Thrashing

5.1 Definition

When a process's working set exceeds available physical memory, frequent page faults cause the system to spend most of its time on page swapping, and CPU utilization drops sharply.

5.2 Working Set Model

Working set \(W(t, \Delta)\): The set of pages referenced by a process within time window \(\Delta\).

\[ \text{Total demand} = \sum_{i} |W_i(t, \Delta)| \]

If total demand > physical memory -> thrashing occurs.

OS strategies:

  • Monitor each process's working set size
  • If total demand exceeds memory, suspend some processes (medium-term scheduling)
  • Reactivate when conditions are met

5.3 PFF (Page Fault Frequency)

A more practical thrashing detection method:

  • Monitor each process's page fault rate
  • Fault rate > upper threshold -> allocate more frames
  • Fault rate < lower threshold -> reclaim some frames
  • No frames available to allocate -> suspend process

6. Memory Allocation

6.1 Kernel-Level Memory Allocation

Contiguous Memory Allocation

Algorithm Strategy Characteristics
First Fit Find the first sufficiently large free block Fast, low-address fragmentation
Best Fit Find the smallest sufficiently large free block Reduces waste but slow search, creates tiny fragments
Worst Fit Find the largest free block Reduces tiny fragments but large blocks deplete quickly
Next Fit Continue searching from last allocation point More even distribution

Buddy System

The foundation of Linux physical memory allocation:

  • Block sizes are \(2^k\) pages (\(0 \leq k \leq \text{MAX\_ORDER}-1\))
  • Allocation: Find the smallest \(2^k\) block satisfying the request; if too large, split in half
  • Deallocation: Check if the buddy block is free; if so, merge into a \(2^{k+1}\) block
\[ \text{Internal fragmentation} \leq 50\% \quad (\text{worst case: request } 2^{k-1}+1\text{, allocate }2^k) \]

Advantage: Merge operations are simple and efficient (buddy addresses differ by only one bit)

Slab Allocator

Linux kernel objects (task_struct, inode, etc.) are frequently allocated and freed. Slab provides object-level caching on top of Buddy:

  • Pre-allocates pools of specific-size objects
  • Released objects are retained in the cache for reuse
  • Eliminates internal fragmentation, reduces initialization overhead

6.2 User-Level Memory Allocation

malloc / free implementations (e.g., glibc ptmalloc, jemalloc, tcmalloc):

Allocator Characteristics
ptmalloc (glibc) Multiple arenas reduce lock contention; bins categorized by size
jemalloc (FreeBSD/Firefox) Thread-local caches, better multi-thread scalability
tcmalloc (Google) Thread-local allocation for small objects, centralized management for large objects
mimalloc (Microsoft) Extreme small-object performance

7. Memory-Mapped I/O

mmap maps files or devices into a process's address space:

Use Description
File mapping Map file contents as memory, loaded on demand (lazy loading)
Anonymous mapping Allocate memory not associated with a file (large malloc)
Shared mapping Multiple processes share the same physical pages (IPC, shared libraries)
Private mapping COW (Copy-on-Write), copy when written

Copy-on-Write (COW)

On fork(), parent and child share physical pages (marked read-only):

  1. Parent or child attempts to write -> triggers page protection fault
  2. Kernel copies the page -> each has an independent copy
  3. If child immediately calls exec() -> most pages are never copied (highly efficient)

8. Memory Protection and Security

Mechanism Function
Page table permission bits Read/write/execute permission control
NX bit Mark pages as non-executable (prevents code injection)
ASLR Address Space Layout Randomization (prevents ROP attacks)
Guard Page Stack overflow detection (inaccessible sentinel page)
SMEP/SMAP Prevent kernel mode from executing/accessing user-space memory

9. Performance Optimization Practices

Problem Optimization
Excessive TLB misses Use huge pages, reduce address space fragmentation
High page fault rate Add physical memory, optimize data locality
Memory fragmentation Memory compaction, Slab allocator
NUMA latency NUMA-aware memory allocation (numactl, mbind)
Excessive swapping Add memory, adjust swappiness, use zswap compression


评论 #