Operating System
An operating system (OS) is the most fundamental piece of system software in a computer system. It manages hardware resources and provides a unified abstraction layer for higher-level applications. The core responsibilities of an OS include: process and thread management (CPU time allocation and scheduling), memory management (mapping between virtual address spaces and physical memory), file systems (persistent storage and organization of data), I/O management (device drivers and buffering strategies), and synchronization and security (concurrency control and privilege isolation).
Understanding operating systems is foundational to grasping computer architecture, parallel computing, distributed systems, and even deep learning training frameworks (e.g., GPU scheduling, GPU memory management). These notes cover the core topics of operating systems and are suitable for course review and interview preparation.
Why It Belongs to Computer Science
Operating systems belong to computer science not because they "perform more calculations," but because they study how computation is organized on a single machine. An OS turns raw hardware resources into programmable abstractions such as process / thread, virtual memory, file system, and socket, and uses scheduling, synchronization, isolation, and protection to let many programs run correctly and efficiently on the same machine.
It certainly overlaps with computer engineering, especially around interrupts, the MMU, device drivers, and real-time systems. But its central questions are still about the abstraction, allocation, and control of computational resources, which is why it fits most naturally under Computer Science.
Process Management
Process vs. Thread
| Dimension | Process | Thread |
|---|---|---|
| Definition | Basic unit of resource allocation | Basic unit of CPU scheduling |
| Address space | Has its own independent address space | Shares the address space of its parent process |
| Resource overhead | High cost to create/destroy | Low cost to create/destroy |
| Communication | Requires IPC mechanisms (pipes, shared memory, etc.) | Can directly read/write shared variables |
| Context switch cost | High (requires switching page tables, etc.) | Low |
| Safety | A crash in one process does not affect others | A crash in one thread may bring down the entire process |
| Concurrency | Multi-process concurrency | Multi-thread concurrency; threads within the same process share the heap |
User-Level Threads vs. Kernel-Level Threads
- User-Level Thread: Managed by a user-space thread library; the kernel is unaware of them. Fast to switch but cannot leverage multiple cores.
- Kernel-Level Thread: Managed and scheduled by the kernel; can exploit multi-core parallelism, but switching overhead is higher.
- Mapping models: Many-to-One, One-to-One, Many-to-Many.
Process State Transitions
A process goes through the following states during its lifecycle:
+---------+
| New |
+----+----+
| admitted
v
+----------+ +---------+ +----------+
| Waiting |<---| Ready |--->| Running |
| | | |<---| |
+----+-----+ +---------+ +-----+----+
| scheduler ^ |
| dispatch | |
+----- I/O complete ------+ |
| exit
v
+-----------+
| Terminated|
+-----------+
- New → Ready: The process is created and, once resources are allocated, enters the ready queue
- Ready → Running: The scheduler selects the process and assigns it the CPU
- Running → Waiting: The process issues an I/O request or waits for an event
- Running → Ready: The time quantum expires (preemption) or a higher-priority process arrives
- Waiting → Ready: The I/O operation completes or the awaited event occurs
- Running → Terminated: The process finishes execution or is forcibly terminated
Process Control Block (PCB)
The PCB is the core data structure used by the OS to describe and manage a process. Each process has a unique PCB.
| Field | Description |
|---|---|
| PID (Process ID) | Unique identifier for the process |
| Process state | New / Ready / Running / Waiting / Terminated |
| Program Counter (PC) | Address of the next instruction to execute |
| CPU registers | Current values of all registers |
| CPU scheduling info | Priority, scheduling queue pointers |
| Memory management info | Page table base address, segment table, address space boundaries |
| I/O status info | List of open files, allocated I/O devices |
| Accounting info | CPU time used, time limits, etc. |
Context Switch
A context switch is the process of saving the state of the currently running process and restoring the state of the next process when the CPU switches from one process to another.
Steps involved:
- Save the current process's PCB (registers, PC, stack pointer, etc.)
- Update the current process's state (Running → Ready / Waiting)
- Place the current process's PCB in the appropriate queue
- Select the next process to run (scheduling decision)
- Restore the new process's PCB
- Update the new process's state (Ready → Running)
- Jump to the new process's PC and resume execution
Context Switch Overhead
A context switch is pure system overhead — the system performs no "useful" work during the switch. Sources of overhead include:
- Saving/restoring registers
- Flushing the TLB (page table cache)
- Flushing CPU caches (cache pollution)
- Pipeline drain
A typical context switch takes on the order of microseconds (usually 1–10 us).
Inter-Process Communication (IPC)
| IPC Mechanism | Description | Characteristics |
|---|---|---|
| Pipe | Half-duplex, unidirectional data flow; includes anonymous pipes and named pipes (FIFOs) | Simple to use, but anonymous pipes are limited to related processes |
| Message Queue | A linked list of messages maintained in the kernel; processes can send/receive messages | Supports message priorities; no explicit synchronization needed |
| Shared Memory | Multiple processes map the same physical memory region into their respective address spaces | Fastest method, but requires a synchronization mechanism (e.g., semaphores) |
| Semaphore | A counter used to control concurrent access to shared resources | Primarily used for synchronization, not data transfer |
| Signal | Asynchronous notification mechanism, e.g., SIGKILL, SIGTERM, SIGINT |
Low overhead, but limited in the amount of information conveyed |
| Socket | Supports communication between different hosts | Most versatile; can be used for network communication |
CPU Scheduling
Scheduling Criteria
| Criterion | Definition | Optimization Goal |
|---|---|---|
| CPU Utilization | Proportion of time the CPU is busy | Maximize |
| Throughput | Number of processes completed per unit of time | Maximize |
| Turnaround Time | Total time from submission to completion: $\(T_{turnaround} = T_{completion} - T_{arrival}\)$ | Minimize |
| Waiting Time | Total time spent waiting in the ready queue | Minimize |
| Response Time | Time from submission to the first response: $\(T_{response} = T_{first\_run} - T_{arrival}\)$ | Minimize |
Preemptive vs. Non-Preemptive
- Non-preemptive: Once a process obtains the CPU, it runs until completion or voluntarily yields (e.g., for I/O).
- Preemptive: The scheduler can interrupt the current process at any time and reassign the CPU to another process. Most modern operating systems use preemptive scheduling.
Main Scheduling Algorithms
| Algorithm | Preemptive? | Advantages | Disadvantages | Use Cases |
|---|---|---|---|---|
| FCFS (First Come, First Served) | No | Simple to implement | Convoy effect — short processes wait behind long ones | Batch systems |
| SJF (Shortest Job First) | No | Optimal average waiting time (non-preemptive case) | May starve long jobs; requires advance knowledge of execution time | Batch systems with known runtimes |
| SRTF (Shortest Remaining Time First) | Yes | Optimal average waiting time (preemptive case) | Same as SJF; starvation is more severe | Same as SJF |
| Round Robin | Yes | Fair; good response time | Large quantum degenerates to FCFS; small quantum increases context switch overhead | Time-sharing systems |
| Priority | Optional | Flexible; supports differentiating task importance | Low-priority processes may starve (solvable via aging) | General-purpose systems |
| MLFQ (Multi-Level Feedback Queue) | Yes | Balances interactive and batch processes; adaptive | Complex to implement; parameter tuning is difficult | General-purpose OSes (Linux, Windows) |
Scheduling Algorithm Examples
FCFS Example
Process arrival order: P1(burst=24), P2(burst=3), P3(burst=3), all arriving at t=0.
Gantt chart:
| P1 | P2 | P3 |
0 24 27 30
- Average waiting time = (0 + 24 + 27) / 3 = 17
SJF Example
Process arrival order: P1(burst=6), P2(burst=8), P3(burst=7), P4(burst=3), all arriving at t=0.
Gantt chart:
| P4 | P1 | P3 | P2 |
0 3 9 16 24
- Average waiting time = (3 + 16 + 9 + 0) / 4 = 7
SRTF Example
| Process | Arrival Time | Burst |
|---|---|---|
| P1 | 0 | 8 |
| P2 | 1 | 4 |
| P3 | 2 | 2 |
Gantt chart:
| P1| P2| P3 | P2 | P1 |
0 1 2 4 6 14
- P1 waiting = (0) + (6-1) = 5, P2 waiting = (1-1) + (4-2) = 2, P3 waiting = (2-2) = 0
- Average waiting time = (5 + 2 + 0) / 3 = 2.33
Round Robin Example (time quantum q=4)
Process arrival order: P1(burst=24), P2(burst=3), P3(burst=3), all arriving at t=0.
Gantt chart:
| P1 | P2 | P3 | P1 | P1 | P1 | P1 | P1 |
0 4 7 10 14 18 22 26 30
- Average waiting time = ((30-24) + 4 + 7) / 3 = 5.67
Choosing the Time Quantum for Round Robin
- If \(q\) is too large → degenerates to FCFS
- If \(q\) is too small → context switch overhead increases
- Rule of thumb: \(q\) should be large enough so that 80% of CPU bursts complete within a single quantum
Priority Scheduling Example
| Process | Burst | Priority (lower number = higher priority) |
|---|---|---|
| P1 | 10 | 3 |
| P2 | 1 | 1 |
| P3 | 2 | 4 |
| P4 | 1 | 5 |
| P5 | 5 | 2 |
Gantt chart:
| P2| P5 | P1 | P3 | P4|
0 1 6 16 18 19
- Average waiting time = (6 + 0 + 16 + 18 + 1) / 5 = 8.2
MLFQ (Multi-Level Feedback Queue)
Core rules of MLFQ:
- If Priority(A) > Priority(B), run A
- If Priority(A) = Priority(B), run in round-robin
- New processes enter the highest-priority queue
- If a process exhausts its time allotment at the current level, it is demoted to the next lower queue
- After a time period \(S\) (boost interval), all processes are promoted to the highest-priority queue (to prevent starvation)
Queue 0 (最高优先级, q=8ms): | P_new | ...
Queue 1 (中优先级, q=16ms): | P_cpu_bound | ...
Queue 2 (最低优先级, FCFS): | P_long_running | ...
Synchronization
The Critical Section Problem
A critical section is a segment of code in which a process accesses shared resources. At most one process may be inside its critical section at any given time.
A correct solution to the critical section problem must satisfy three conditions:
- Mutual Exclusion: At most one process may be in its critical section at any time
- Progress: If no process is in its critical section, a process that wishes to enter should not be indefinitely postponed
- Bounded Waiting: There must be an upper bound on the number of times other processes can enter their critical sections after a process has requested entry
Peterson's Solution
Peterson's algorithm is a classic software solution for mutual exclusion between two processes.
// Code for process i
int flag[2]; // flag[i] = true 表示进程 i 想进入临界区
int turn; // 表示轮到谁进入临界区
// 进程 i 进入临界区
flag[i] = true;
turn = j; // 礼让对方
while (flag[j] && turn == j)
; // 忙等待(busy waiting)
// --- 临界区 ---
flag[i] = false; // 退出临界区
Limitations of Peterson's Solution
Peterson's algorithm is not guaranteed to be correct on modern multi-core processors because the CPU and compiler may reorder instructions. In practice, memory barriers or hardware atomic instructions are required.
Mutex Locks and Semaphores
Mutex Lock:
acquire(lock); // 如果锁被占用则阻塞
// --- 临界区 ---
release(lock); // 释放锁
Semaphore: An integer variable accessed through two atomic operations — wait() (P operation) and signal() (V operation).
wait(S) { // P 操作
while (S <= 0)
; // 忙等待(或阻塞)
S--;
}
signal(S) { // V 操作
S++;
}
| Type | Description |
|---|---|
| Binary Semaphore | Value is 0 or 1; functionally equivalent to a mutex lock |
| Counting Semaphore | Value can be any non-negative integer; used to control concurrent access to a finite number of resources |
Classic Synchronization Problems
Producer-Consumer Problem
Also known as the Bounded Buffer Problem.
semaphore mutex = 1; // 互斥访问缓冲区
semaphore empty = N; // 空槽位数量
semaphore full = 0; // 已占槽位数量
// 生产者
void producer() {
while (true) {
item = produce_item();
wait(empty); // 等待空位
wait(mutex); // 进入临界区
insert(item);
signal(mutex); // 离开临界区
signal(full); // 通知消费者
}
}
// 消费者
void consumer() {
while (true) {
wait(full); // 等待数据
wait(mutex); // 进入临界区
item = remove();
signal(mutex); // 离开临界区
signal(empty); // 通知生产者
consume(item);
}
}
Order of wait Operations Matters
The order of wait(empty) and wait(mutex) must not be reversed; otherwise, a deadlock may occur: the producer holds the mutex but the buffer is full, so it blocks on empty, while the consumer blocks on mutex and cannot consume.
Readers-Writers Problem
- Multiple readers may read simultaneously
- A writer must have exclusive access
semaphore rw_mutex = 1; // 读写互斥
semaphore mutex = 1; // 保护 read_count
int read_count = 0;
// 写者
void writer() {
wait(rw_mutex);
// --- 写操作 ---
signal(rw_mutex);
}
// 读者
void reader() {
wait(mutex);
read_count++;
if (read_count == 1)
wait(rw_mutex); // 第一个读者锁定写者
signal(mutex);
// --- 读操作 ---
wait(mutex);
read_count--;
if (read_count == 0)
signal(rw_mutex); // 最后一个读者释放写者
signal(mutex);
}
Variants of the Readers-Writers Problem
- First variant (readers' priority): The solution above; may starve writers.
- Second variant (writers' priority): No new readers may enter while a writer is waiting.
- In practice, read-write locks (
rwlock) are commonly used.
Dining Philosophers Problem
Five philosophers sit around a circular table with one chopstick between each pair. Philosophers alternate between thinking and eating; eating requires picking up both the left and right chopstick.
Naive solution (may deadlock):
semaphore chopstick[5] = {1, 1, 1, 1, 1};
void philosopher(int i) {
while (true) {
think();
wait(chopstick[i]); // 拿左筷子
wait(chopstick[(i+1)%5]); // 拿右筷子
eat();
signal(chopstick[(i+1)%5]);
signal(chopstick[i]);
}
}
Strategies to avoid deadlock:
- Allow at most 4 philosophers to pick up chopsticks simultaneously
- Odd-numbered philosophers pick up the left first; even-numbered pick up the right first (breaks circular wait)
- Use
trylock— if a chopstick cannot be acquired, release the one already held (breaks hold-and-wait)
Deadlock
Four Necessary Conditions for Deadlock
A deadlock can occur only when all four of the following conditions hold simultaneously:
- Mutual Exclusion: A resource cannot be shared by multiple processes at the same time
- Hold and Wait: A process holds at least one resource while waiting to acquire additional resources
- No Preemption: Resources cannot be forcibly taken away; they can only be released voluntarily by the holder
- Circular Wait: There exists a set of processes \(\{P_0, P_1, \dots, P_n\}\) such that \(P_i\) is waiting for a resource held by \(P_{(i+1) \mod (n+1)}\)
Deadlock Handling Strategies
| Strategy | Method | Description |
|---|---|---|
| Prevention | Break one of the four necessary conditions | E.g., require a process to request all resources at once (breaks hold-and-wait); number resources and request them in order (breaks circular wait) |
| Avoidance | Dynamically check whether resource allocation is safe | Banker's algorithm; requires advance knowledge of maximum resource demands |
| Detection | Allow deadlocks to occur; detect them periodically | Resource Allocation Graph; Wait-for Graph |
| Recovery | Take action after detecting a deadlock | Terminate processes (one by one or all at once); resource preemption (roll back process state) |
What Real Systems Do
Most operating systems (e.g., Linux, Windows) adopt the ostrich approach: they ignore the deadlock problem and leave it to application developers. The overhead of deadlock prevention/avoidance often exceeds the cost of occasional deadlocks.
Banker's Algorithm
The Banker's algorithm is used for deadlock avoidance. Its core idea is: before allocating resources, check whether the system would remain in a safe state after the allocation.
Data structures (\(n\) processes, \(m\) resource types):
- \(Available[m]\): Number of currently available instances of each resource type
- \(Max[n][m]\): Maximum demand of each process for each resource type
- \(Allocation[n][m]\): Resources currently allocated to each process
- \(Need[n][m]\): Remaining resource needs of each process, where \(Need[i][j] = Max[i][j] - Allocation[i][j]\)
Safety check algorithm:
- Let \(Work = Available\) and \(Finish[i] = false\) (\(\forall i\))
- Find a process \(P_i\) such that \(Finish[i] = false\) and \(Need[i] \leq Work\)
- If found, set \(Work = Work + Allocation[i]\) and \(Finish[i] = true\), then go back to step 2
- If \(Finish[i] = true\) for all processes, the system is in a safe state
Safe State vs. Unsafe State
- Safe state: There exists a safe sequence in which all processes can complete successfully. A safe state guarantees no deadlock.
- Unsafe state: No safe sequence exists. An unsafe state does not necessarily mean deadlock, but there is a risk of deadlock.
Memory Management
Contiguous vs. Non-Contiguous Allocation
| Allocation Method | Description | Advantages | Disadvantages |
|---|---|---|---|
| Contiguous | Each process occupies a contiguous block of memory | Simple to implement; high access efficiency | External fragmentation |
| Non-contiguous | A process can be scattered across different memory locations | Reduces fragmentation | Requires additional address mapping mechanisms |
Contiguous allocation strategies:
- First Fit: Scan from the beginning and use the first free block that is large enough
- Best Fit: Find the smallest free block that is large enough (minimizes leftover fragmentation)
- Worst Fit: Use the largest free block
Types of Fragmentation
- External Fragmentation: The total free memory is sufficient, but it is non-contiguous and cannot be allocated to a process. Can be resolved through compaction.
- Internal Fragmentation: The memory allocated to a process exceeds its actual needs; the excess is wasted. Common in paging.
Paging
Paging is a non-contiguous memory allocation scheme:
- Physical memory is divided into fixed-size blocks called frames
- The logical address space is divided into blocks of the same size called pages
- A page table maps logical page numbers to physical frame numbers
Address translation:
A logical address consists of two parts: page number (p) and page offset (d).
where \(f = PageTable[p]\) is the corresponding physical frame number.
If the page size is \(2^n\) bytes and the logical address space is \(2^m\) bytes, then:
- Page offset = lower \(n\) bits
- Page number = upper \((m - n)\) bits
- Number of page table entries = \(2^{m-n}\)
TLB (Translation Lookaside Buffer)
The TLB is a high-speed cache for the page table (typically located in the CPU's MMU) that accelerates address translation.
where \(\alpha\) is the TLB hit rate, \(t_{TLB}\) is the TLB lookup time, and \(t_{mem}\) is the time for one memory access.
Typical TLB Parameters
- Number of TLB entries: 64 – 1024
- Hit rate: typically > 99%
- The TLB must be flushed on context switch (or ASIDs can be used to identify processes)
Multi-Level Page Table
When the logical address space is very large, a single-level page table would consume excessive memory. Multi-level page tables split the page table into layers, allocating page table memory only for address ranges actually in use.
Taking a two-level page table as an example:
逻辑地址:| 一级页号 p1 | 二级页号 p2 | 页内偏移 d |
↓
一级页表[p1] → 二级页表基址
↓
二级页表[p2] → 物理帧号 f
↓
物理地址 = (f, d)
- Advantage: Saves memory (page table entries for unused regions are not allocated)
- Disadvantage: Adds one additional memory access per level of the page table
Segmentation
Segmentation divides the logical address space according to the logical structure of the program into several segments, such as the code segment, data segment, stack segment, etc.
Logical address: (segment number s, offset within segment d)
- Segment table entry: (base address, limit)
- Address translation: If \(d < limit\), physical address = \(base + d\); otherwise, a segmentation fault occurs
| Comparison | Paging | Segmentation |
|---|---|---|
| Division basis | Physical division, fixed size | Logical division, variable size |
| Fragmentation type | Internal fragmentation | External fragmentation |
| Visible to user? | Transparent to the user | User-visible (divided by logical modules) |
| Sharing and protection | Page-level granularity (coarse) | Segment-level granularity (convenient for sharing and protection) |
Virtual Memory
Virtual memory allows a process to use an address space larger than the available physical memory. The core idea: only load the pages currently needed into physical memory; keep the rest on disk.
Demand Paging
A page is loaded from disk into memory only when it is accessed.
- Page Fault: Triggered when the accessed page is not in memory
- Handling procedure: trap to kernel → locate page on disk → load into a free frame → update page table → re-execute the faulting instruction
Page fault rate and effective access time:
where \(p\) is the page fault rate, and \(t_{page\_fault}\) is typically on the order of milliseconds (including disk I/O) — far greater than \(t_{mem}\) (nanoseconds).
The Cost of Page Faults
A single page fault takes about 10 ms to handle (disk access), while a normal memory access takes about 100 ns. Thus, a page fault is roughly \(10^5\) times slower than a regular memory access. Even a 0.1% page fault rate can cause significant performance degradation.
Page Replacement Algorithms
When physical memory is full and a new page must be loaded, one existing page must be selected for eviction (replacement).
| Algorithm | Strategy | Advantages | Disadvantages |
|---|---|---|---|
| OPT (Optimal) | Replace the page that will not be used for the longest time in the future | Lowest page fault rate (theoretically optimal) | Impossible to implement (requires knowledge of future access patterns) |
| FIFO (First In, First Out) | Replace the page that has been in memory the longest | Simple to implement | Poor performance; suffers from Belady's anomaly |
| LRU (Least Recently Used) | Replace the page that has not been accessed for the longest time | Performance close to OPT | High implementation overhead (requires recording access times or maintaining a stack) |
| Clock | Approximate LRU based on a reference bit | Simple to implement with good performance | Only an approximation of LRU |
FIFO Example
Page reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1 (3 frames)
引用: 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1
帧1: 7 7 7 2 2 2 2 4 4 4 0 0 0 0 0 0 0
帧2: 0 0 0 0 3 3 3 2 2 2 2 2 1 1 1 1
帧3: 1 1 1 1 0 0 0 3 3 3 3 3 2 2 2 (修正)
缺页: * * * * * * * * * * * * *
Number of page faults: 15
Belady's Anomaly
The FIFO algorithm can exhibit a counterintuitive phenomenon where allocating more frames actually increases the page fault rate. This is known as Belady's Anomaly. LRU and OPT do not suffer from this issue (they belong to the class of stack algorithms).
LRU Example
Page reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1 (3 frames)
引用: 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1
帧: {7}{7,0}{7,0,1}{2,0,1}{2,0,1}{2,0,3}{2,0,3}→置换最久未用
→ 最终缺页次数:12
LRU's page fault count (12) is lower than FIFO's (15).
Clock Algorithm
The Clock algorithm uses a circular linked list and a pointer (clock hand):
- Each frame has a reference bit that is set to 1 when the page is accessed
- When a replacement is needed, the pointer scans:
- If reference bit = 1, clear it to 0 (give a second chance) and advance the pointer
- If reference bit = 0, select this page for replacement
Thrashing
When the system has too many processes and each is allocated too few frames, page faults occur frequently. Processes spend most of their time performing page replacements (I/O), and CPU utilization actually drops. This phenomenon is called thrashing.
Cause: A process's working set exceeds the number of physical frames allocated to it.
Solutions:
- Working Set Model: Track the set of pages each process accesses within a time window \(\Delta\) (the working set), and ensure each process is allocated enough frames to hold its working set
- Page Fault Frequency (PFF): Monitor each process's page fault rate. If the rate is too high, allocate more frames; if too low, reclaim some frames
- Reduce the degree of multiprogramming (decrease the number of concurrent processes)
File System
File System Layer Hierarchy
Layers from top to bottom:
┌─────────────────────────────────┐
│ 应用程序(Application) │
├─────────────────────────────────┤
│ 逻辑文件系统(Logical FS) │ ← 管理元数据、目录结构
├─────────────────────────────────┤
│ 文件组织模块(File Organization)│ ← 逻辑块号 → 物理块号
├─────────────────────────────────┤
│ 基本文件系统(Basic FS) │ ← 向设备驱动发出 I/O 请求
├─────────────────────────────────┤
│ I/O 控制(I/O Control) │ ← 设备驱动程序、中断处理
├─────────────────────────────────┤
│ 设备(Devices) │ ← 物理硬盘/SSD
└─────────────────────────────────┘
Directory Structures
| Structure | Description | Characteristics |
|---|---|---|
| Single-level directory | All files reside in the same directory | Simple but file names must be unique |
| Two-level directory | One directory per user | Resolves naming conflicts but inflexible |
| Tree-structured directory | Hierarchical directory structure | Most common; supports path names (absolute/relative) |
| Acyclic-graph directory (DAG) | Allows sharing of files/subdirectories (hard links, symbolic links) | Supports file sharing; must handle dangling references |
File Allocation Methods
| Method | Description | Advantages | Disadvantages |
|---|---|---|---|
| Contiguous allocation | A file occupies contiguous blocks on disk | Fast sequential and random access | External fragmentation; file size is hard to grow dynamically |
| Linked allocation | Each block contains a pointer to the next block | No external fragmentation; file size can grow dynamically | Only supports sequential access; pointers consume space |
| Indexed allocation | A dedicated index block stores pointers to all data blocks | Supports direct access; no external fragmentation | The index block itself consumes space |
FAT and inode
- FAT (File Allocation Table): A variant of linked allocation. All block pointers are stored centrally in a single table (the FAT), enabling random access.
- inode (Unix/Linux): An extension of indexed allocation. An inode contains direct pointers (12), a single indirect pointer, a double indirect pointer, and a triple indirect pointer, supporting very large files.
Free Space Management
| Method | Description |
|---|---|
| Bit Map (Bit Vector) | Each block is represented by 1 bit (0 = free, 1 = allocated). Convenient for finding contiguous free blocks |
| Free List | All free blocks are linked in a linked list. Allocating the first free block is fast, but finding contiguous space is slow |
| Grouping | Free blocks are grouped; the first free block stores addresses of several other free blocks |
| Counting | Records the starting address and count of contiguous free blocks |
Disk Scheduling Algorithms
The goal of disk scheduling: minimize the total distance of head movement (seek time).
Assume the head is currently at track 53. Request queue: 98, 183, 37, 122, 14, 124, 65, 67. Track range: 0–199.
| Algorithm | Strategy | Characteristics |
|---|---|---|
| FCFS | Service requests in arrival order | Fair, but large total head movement |
| SSTF (Shortest Seek Time First) | Select the request closest to the current head position | Small total movement, but may cause starvation |
| SCAN (Elevator Algorithm) | Head moves in one direction to the end, then reverses | More uniform waiting time |
| C-SCAN (Circular SCAN) | Head moves in one direction to the end, then jumps back to the start | Even more uniform waiting time |
SCAN Example
Head starts at 53, moving toward 0:
磁头路径: 53 → 37 → 14 → 0 → 65 → 67 → 98 → 122 → 124 → 183
总移动距离: 53 + 183 = 236 磁道
C-SCAN Example
Head starts at 53, moving toward 199:
磁头路径: 53 → 65 → 67 → 98 → 122 → 124 → 183 → 199 → 0 → 14 → 37
总移动距离: (199 - 53) + (199 - 0) + 37 = 382 磁道
Modern Disk Scheduling
Modern operating systems typically use more sophisticated schedulers (e.g., Linux's CFQ, Deadline, and BFQ schedulers) that balance fairness, latency, and throughput. Since SSDs have no mechanical seeking, the significance of disk scheduling is greatly reduced.
I/O Management
I/O Methods
| I/O Method | Mechanism | Advantages | Disadvantages |
|---|---|---|---|
| Polling (Busy Waiting) | The CPU continuously checks the I/O device's status register | Simple to implement | Wastes CPU time; low efficiency |
| Interrupt-Driven I/O | The device sends an interrupt signal to the CPU upon I/O completion | CPU can perform other tasks while waiting | Each byte/word requires an interrupt; frequent interrupts incur high overhead |
| DMA (Direct Memory Access) | The DMA controller transfers data directly between the device and memory; interrupts the CPU only when done | Minimal CPU burden; suitable for bulk data transfers | Requires DMA hardware; competes with the CPU for the bus |
轮询: CPU ──循环检查──> 设备状态
中断: CPU ──执行其他──> 设备完成 ──中断──> CPU 处理
DMA: CPU ──启动DMA──> DMA控制器搬运数据 ──完成中断──> CPU
Interrupt Handling Procedure
- The device completes I/O and sends an interrupt request to the CPU via the interrupt controller
- The CPU responds to the interrupt after finishing the current instruction
- The current process context is saved (PC, registers, etc.)
- The CPU jumps to the appropriate interrupt service routine (ISR) via the interrupt vector table
- The ISR executes (e.g., copying data from the device buffer to the kernel buffer)
- The context is restored, and execution resumes at the interrupted program
Buffering Strategies
Buffering is used to reconcile the speed difference between the CPU and I/O devices.
| Strategy | Description |
|---|---|
| No buffering | Data is transferred directly between user space and the device. Least efficient |
| Single buffer | A single buffer is set up in kernel space. The device writes to the buffer while the CPU processes the previous batch |
| Double buffer | Two buffers are used alternately. The device writes to one while the CPU reads from the other |
| Circular buffer | Multiple buffers arranged in a ring queue. Suitable for the producer-consumer model |
Spooling
Spooling (Simultaneous Peripheral Operations On-Line) virtualizes exclusive devices (such as printers) into shared devices. Output from multiple processes is first written to a spool area on disk, and the spooling system sends it to the device in order. The most common example: the print queue.
Interrupt Handling
Interrupt Types
| Type | Source | Examples |
|---|---|---|
| Hardware Interrupt | External devices | Keyboard input, timer, network packet arrival |
| Software Interrupt (Trap) | Program instructions | System call (syscall), breakpoint |
| Exception | CPU internal | Division by zero, page fault, segmentation fault |
Interrupt Handling Procedure
- The CPU saves the current context (PC, registers, status word)
- Looks up the ISR (Interrupt Service Routine) address via the interrupt vector
- Jumps to and executes the ISR
- After the ISR completes, restores the context and resumes the original program
Interrupt Priority: Higher-priority interrupts can preempt lower-priority ISRs (nested interrupts).
Interrupts vs. Polling
| Method | Advantages | Disadvantages |
|---|---|---|
| Interrupts | CPU does not idle-wait; high efficiency | Context switch overhead |
| Polling | Simple; deterministic latency | Wastes CPU time |
Containers and Virtualization
Virtual Machines (VM)
Run multiple guest OSes on physical hardware via a Hypervisor: - Type 1 (bare-metal): VMware ESXi, KVM, Xen - Type 2 (hosted): VirtualBox, VMware Workstation
Containers
Share the host OS kernel, achieving isolation through namespaces and cgroups: - Namespace: Isolates PID, network, filesystem, users - Cgroup: Limits CPU, memory, and I/O usage
Docker
Dockerfile → Image → Container
- Image: A read-only template built in layers (Union FS)
- Container: A running instance of an image
- Docker Compose: Multi-container orchestration
VM vs. Container
| Dimension | Virtual Machine | Container |
|---|---|---|
| Isolation level | Full (independent kernel) | Process-level (shared kernel) |
| Startup time | Minutes | Seconds |
| Resource overhead | Large (GB-scale) | Small (MB-scale) |
| Security | Stronger | Weaker (shared kernel) |
See Version Control and CI/CD for Docker practices in development.
Relations to Other Topics
- Computer Architecture Overview provides the hardware background for interrupts, caches, the MMU, and privilege levels that operating systems rely on.
- Computer Networks extends the
socketabstraction, protocol stack, and communication model from a single machine to multi-machine environments. - Distributed Systems shows how processes, threads, synchronization, and fault handling scale from one OS instance to a cluster.
- Version Control and CI/CD shows how containers, process isolation, and deployment automation are applied in engineering practice.
References
- Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. Operating System Concepts. Wiley.
- Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. Operating Systems: Three Easy Pieces.
- Robert Love. Linux Kernel Development. Addison-Wesley.