Operating System

An operating system (OS) is the most fundamental piece of system software in a computer system. It manages hardware resources and provides a unified abstraction layer for higher-level applications. The core responsibilities of an OS include: process and thread management (CPU time allocation and scheduling), memory management (mapping between virtual address spaces and physical memory), file systems (persistent storage and organization of data), I/O management (device drivers and buffering strategies), and synchronization and security (concurrency control and privilege isolation).

Understanding operating systems is foundational to grasping computer architecture, parallel computing, distributed systems, and even deep learning training frameworks (e.g., GPU scheduling, GPU memory management). These notes cover the core topics of operating systems and are suitable for course review and interview preparation.

Why It Belongs to Computer Science

Operating systems belong to computer science not because they "perform more calculations," but because they study how computation is organized on a single machine. An OS turns raw hardware resources into programmable abstractions such as process / thread, virtual memory, file system, and socket, and uses scheduling, synchronization, isolation, and protection to let many programs run correctly and efficiently on the same machine.

It certainly overlaps with computer engineering, especially around interrupts, the MMU, device drivers, and real-time systems. But its central questions are still about the abstraction, allocation, and control of computational resources, which is why it fits most naturally under Computer Science.

Process Management

Process vs. Thread

Dimension	Process	Thread
Definition	Basic unit of resource allocation	Basic unit of CPU scheduling
Address space	Has its own independent address space	Shares the address space of its parent process
Resource overhead	High cost to create/destroy	Low cost to create/destroy
Communication	Requires IPC mechanisms (pipes, shared memory, etc.)	Can directly read/write shared variables
Context switch cost	High (requires switching page tables, etc.)	Low
Safety	A crash in one process does not affect others	A crash in one thread may bring down the entire process
Concurrency	Multi-process concurrency	Multi-thread concurrency; threads within the same process share the heap

User-Level Threads vs. Kernel-Level Threads

User-Level Thread: Managed by a user-space thread library; the kernel is unaware of them. Fast to switch but cannot leverage multiple cores.
Kernel-Level Thread: Managed and scheduled by the kernel; can exploit multi-core parallelism, but switching overhead is higher.
Mapping models: Many-to-One, One-to-One, Many-to-Many.

Process State Transitions

A process goes through the following states during its lifecycle:

                    +---------+
                    |   New   |
                    +----+----+
                         | admitted
                         v
    +----------+    +---------+    +----------+
    | Waiting  |<---| Ready   |--->| Running  |
    |          |    |         |<---|          |
    +----+-----+    +---------+    +-----+----+
         |          scheduler      ^     |
         |          dispatch       |     |
         +----- I/O complete ------+     |
                                         | exit
                                         v
                                   +-----------+
                                   | Terminated|
                                   +-----------+

New → Ready: The process is created and, once resources are allocated, enters the ready queue
Ready → Running: The scheduler selects the process and assigns it the CPU
Running → Waiting: The process issues an I/O request or waits for an event
Running → Ready: The time quantum expires (preemption) or a higher-priority process arrives
Waiting → Ready: The I/O operation completes or the awaited event occurs
Running → Terminated: The process finishes execution or is forcibly terminated

Process Control Block (PCB)

The PCB is the core data structure used by the OS to describe and manage a process. Each process has a unique PCB.

Field	Description
PID (Process ID)	Unique identifier for the process
Process state	New / Ready / Running / Waiting / Terminated
Program Counter (PC)	Address of the next instruction to execute
CPU registers	Current values of all registers
CPU scheduling info	Priority, scheduling queue pointers
Memory management info	Page table base address, segment table, address space boundaries
I/O status info	List of open files, allocated I/O devices
Accounting info	CPU time used, time limits, etc.

Context Switch

A context switch is the process of saving the state of the currently running process and restoring the state of the next process when the CPU switches from one process to another.

Steps involved:

Save the current process's PCB (registers, PC, stack pointer, etc.)
Update the current process's state (Running → Ready / Waiting)
Place the current process's PCB in the appropriate queue
Select the next process to run (scheduling decision)
Restore the new process's PCB
Update the new process's state (Ready → Running)
Jump to the new process's PC and resume execution

Context Switch Overhead

A context switch is pure system overhead — the system performs no "useful" work during the switch. Sources of overhead include:

Saving/restoring registers
Flushing the TLB (page table cache)
Flushing CPU caches (cache pollution)
Pipeline drain

A typical context switch takes on the order of microseconds (usually 1–10 us).

Inter-Process Communication (IPC)

IPC Mechanism	Description	Characteristics
Pipe	Half-duplex, unidirectional data flow; includes anonymous pipes and named pipes (FIFOs)	Simple to use, but anonymous pipes are limited to related processes
Message Queue	A linked list of messages maintained in the kernel; processes can send/receive messages	Supports message priorities; no explicit synchronization needed
Shared Memory	Multiple processes map the same physical memory region into their respective address spaces	Fastest method, but requires a synchronization mechanism (e.g., semaphores)
Semaphore	A counter used to control concurrent access to shared resources	Primarily used for synchronization, not data transfer
Signal	Asynchronous notification mechanism, e.g., `SIGKILL`, `SIGTERM`, `SIGINT`	Low overhead, but limited in the amount of information conveyed
Socket	Supports communication between different hosts	Most versatile; can be used for network communication

CPU Scheduling

Scheduling Criteria

Criterion	Definition	Optimization Goal
CPU Utilization	Proportion of time the CPU is busy	Maximize
Throughput	Number of processes completed per unit of time	Maximize
Turnaround Time	Total time from submission to completion: $$T_{turnaround} = T_{completion} - T_{arrival}$$	Minimize
Waiting Time	Total time spent waiting in the ready queue	Minimize
Response Time	Time from submission to the first response: $$T_{response} = T_{first\_run} - T_{arrival}$$	Minimize

Preemptive vs. Non-Preemptive

Non-preemptive: Once a process obtains the CPU, it runs until completion or voluntarily yields (e.g., for I/O).
Preemptive: The scheduler can interrupt the current process at any time and reassign the CPU to another process. Most modern operating systems use preemptive scheduling.

Main Scheduling Algorithms

Algorithm	Preemptive?	Advantages	Disadvantages	Use Cases
FCFS (First Come, First Served)	No	Simple to implement	Convoy effect — short processes wait behind long ones	Batch systems
SJF (Shortest Job First)	No	Optimal average waiting time (non-preemptive case)	May starve long jobs; requires advance knowledge of execution time	Batch systems with known runtimes
SRTF (Shortest Remaining Time First)	Yes	Optimal average waiting time (preemptive case)	Same as SJF; starvation is more severe	Same as SJF
Round Robin	Yes	Fair; good response time	Large quantum degenerates to FCFS; small quantum increases context switch overhead	Time-sharing systems
Priority	Optional	Flexible; supports differentiating task importance	Low-priority processes may starve (solvable via aging)	General-purpose systems
MLFQ (Multi-Level Feedback Queue)	Yes	Balances interactive and batch processes; adaptive	Complex to implement; parameter tuning is difficult	General-purpose OSes (Linux, Windows)

Scheduling Algorithm Examples

FCFS Example

Process arrival order: P1(burst=24), P2(burst=3), P3(burst=3), all arriving at t=0.

Gantt chart:
| P1              | P2 | P3 |
0                24   27   30

Average waiting time = (0 + 24 + 27) / 3 = 17

SJF Example

Process arrival order: P1(burst=6), P2(burst=8), P3(burst=7), P4(burst=3), all arriving at t=0.

Gantt chart:
| P4 | P1    | P3     | P2      |
0    3       9       16       24

Average waiting time = (3 + 16 + 9 + 0) / 4 = 7

SRTF Example

Process	Arrival Time	Burst
P1	0	8
P2	1	4
P3	2	2

Gantt chart:
| P1| P2| P3 | P2  | P1          |
0   1   2    4     6             14

P1 waiting = (0) + (6-1) = 5, P2 waiting = (1-1) + (4-2) = 2, P3 waiting = (2-2) = 0
Average waiting time = (5 + 2 + 0) / 3 = 2.33

Round Robin Example (time quantum q=4)

Process arrival order: P1(burst=24), P2(burst=3), P3(burst=3), all arriving at t=0.

Gantt chart:
| P1  | P2 | P3 | P1  | P1  | P1  | P1  | P1  |
0     4    7   10    14    18    22    26    30

Average waiting time = ((30-24) + 4 + 7) / 3 = 5.67

Choosing the Time Quantum for Round Robin

If $q$ is too large → degenerates to FCFS
If $q$ is too small → context switch overhead increases
Rule of thumb: $q$ should be large enough so that 80% of CPU bursts complete within a single quantum

Priority Scheduling Example

Process	Burst	Priority (lower number = higher priority)
P1	10	3
P2	1	1
P3	2	4
P4	1	5
P5	5	2

Gantt chart:
| P2| P5    | P1          | P3 | P4|
0   1       6            16   18  19

Average waiting time = (6 + 0 + 16 + 18 + 1) / 5 = 8.2

MLFQ (Multi-Level Feedback Queue)

Core rules of MLFQ:

If Priority(A) > Priority(B), run A
If Priority(A) = Priority(B), run in round-robin
New processes enter the highest-priority queue
If a process exhausts its time allotment at the current level, it is demoted to the next lower queue
After a time period $S$ (boost interval), all processes are promoted to the highest-priority queue (to prevent starvation)

Queue 0 (最高优先级, q=8ms):   | P_new | ...
Queue 1 (中优先级, q=16ms):    | P_cpu_bound | ...
Queue 2 (最低优先级, FCFS):    | P_long_running | ...

Synchronization

The Critical Section Problem

A critical section is a segment of code in which a process accesses shared resources. At most one process may be inside its critical section at any given time.

A correct solution to the critical section problem must satisfy three conditions:

Mutual Exclusion: At most one process may be in its critical section at any time
Progress: If no process is in its critical section, a process that wishes to enter should not be indefinitely postponed
Bounded Waiting: There must be an upper bound on the number of times other processes can enter their critical sections after a process has requested entry

Peterson's Solution

Peterson's algorithm is a classic software solution for mutual exclusion between two processes.

// Code for process i
int flag[2];  // flag[i] = true 表示进程 i 想进入临界区
int turn;     // 表示轮到谁进入临界区

// 进程 i 进入临界区
flag[i] = true;
turn = j;  // 礼让对方
while (flag[j] && turn == j)
    ;  // 忙等待（busy waiting）

// --- 临界区 ---

flag[i] = false;  // 退出临界区

Limitations of Peterson's Solution

Peterson's algorithm is not guaranteed to be correct on modern multi-core processors because the CPU and compiler may reorder instructions. In practice, memory barriers or hardware atomic instructions are required.

Mutex Locks and Semaphores

Mutex Lock:

acquire(lock);   // 如果锁被占用则阻塞
// --- 临界区 ---
release(lock);   // 释放锁

Semaphore: An integer variable accessed through two atomic operations — wait() (P operation) and signal() (V operation).

wait(S) {       // P 操作
    while (S <= 0)
        ;       // 忙等待（或阻塞）
    S--;
}

signal(S) {     // V 操作
    S++;
}

Type	Description
Binary Semaphore	Value is 0 or 1; functionally equivalent to a mutex lock
Counting Semaphore	Value can be any non-negative integer; used to control concurrent access to a finite number of resources

Classic Synchronization Problems

Producer-Consumer Problem

Also known as the Bounded Buffer Problem.

semaphore mutex = 1;    // 互斥访问缓冲区
semaphore empty = N;    // 空槽位数量
semaphore full  = 0;    // 已占槽位数量

// 生产者
void producer() {
    while (true) {
        item = produce_item();
        wait(empty);        // 等待空位
        wait(mutex);        // 进入临界区
        insert(item);
        signal(mutex);      // 离开临界区
        signal(full);       // 通知消费者
    }
}

// 消费者
void consumer() {
    while (true) {
        wait(full);         // 等待数据
        wait(mutex);        // 进入临界区
        item = remove();
        signal(mutex);      // 离开临界区
        signal(empty);      // 通知生产者
        consume(item);
    }
}

Order of wait Operations Matters

The order of wait(empty) and wait(mutex) must not be reversed; otherwise, a deadlock may occur: the producer holds the mutex but the buffer is full, so it blocks on empty, while the consumer blocks on mutex and cannot consume.

Readers-Writers Problem

Multiple readers may read simultaneously
A writer must have exclusive access

semaphore rw_mutex = 1;   // 读写互斥
semaphore mutex = 1;      // 保护 read_count
int read_count = 0;

// 写者
void writer() {
    wait(rw_mutex);
    // --- 写操作 ---
    signal(rw_mutex);
}

// 读者
void reader() {
    wait(mutex);
    read_count++;
    if (read_count == 1)
        wait(rw_mutex);   // 第一个读者锁定写者
    signal(mutex);

    // --- 读操作 ---

    wait(mutex);
    read_count--;
    if (read_count == 0)
        signal(rw_mutex); // 最后一个读者释放写者
    signal(mutex);
}

Variants of the Readers-Writers Problem

First variant (readers' priority): The solution above; may starve writers.
Second variant (writers' priority): No new readers may enter while a writer is waiting.
In practice, read-write locks (rwlock) are commonly used.

Dining Philosophers Problem

Five philosophers sit around a circular table with one chopstick between each pair. Philosophers alternate between thinking and eating; eating requires picking up both the left and right chopstick.

Naive solution (may deadlock):

semaphore chopstick[5] = {1, 1, 1, 1, 1};

void philosopher(int i) {
    while (true) {
        think();
        wait(chopstick[i]);         // 拿左筷子
        wait(chopstick[(i+1)%5]);   // 拿右筷子
        eat();
        signal(chopstick[(i+1)%5]);
        signal(chopstick[i]);
    }
}

Strategies to avoid deadlock:

Allow at most 4 philosophers to pick up chopsticks simultaneously
Odd-numbered philosophers pick up the left first; even-numbered pick up the right first (breaks circular wait)
Use trylock — if a chopstick cannot be acquired, release the one already held (breaks hold-and-wait)

Deadlock

Four Necessary Conditions for Deadlock

A deadlock can occur only when all four of the following conditions hold simultaneously:

Mutual Exclusion: A resource cannot be shared by multiple processes at the same time
Hold and Wait: A process holds at least one resource while waiting to acquire additional resources
No Preemption: Resources cannot be forcibly taken away; they can only be released voluntarily by the holder
Circular Wait: There exists a set of processes $\{P_0, P_1, \dots, P_n\}$ such that $P_i$ is waiting for a resource held by $P_{(i+1) \mod (n+1)}$

Deadlock Handling Strategies

Strategy	Method	Description
Prevention	Break one of the four necessary conditions	E.g., require a process to request all resources at once (breaks hold-and-wait); number resources and request them in order (breaks circular wait)
Avoidance	Dynamically check whether resource allocation is safe	Banker's algorithm; requires advance knowledge of maximum resource demands
Detection	Allow deadlocks to occur; detect them periodically	Resource Allocation Graph; Wait-for Graph
Recovery	Take action after detecting a deadlock	Terminate processes (one by one or all at once); resource preemption (roll back process state)

What Real Systems Do

Most operating systems (e.g., Linux, Windows) adopt the ostrich approach: they ignore the deadlock problem and leave it to application developers. The overhead of deadlock prevention/avoidance often exceeds the cost of occasional deadlocks.

Banker's Algorithm

The Banker's algorithm is used for deadlock avoidance. Its core idea is: before allocating resources, check whether the system would remain in a safe state after the allocation.

Data structures ($n$ processes, $m$ resource types):

$Available[m]$: Number of currently available instances of each resource type
$Max[n][m]$: Maximum demand of each process for each resource type
$Allocation[n][m]$: Resources currently allocated to each process
$Need[n][m]$: Remaining resource needs of each process, where $Need[i][j] = Max[i][j] - Allocation[i][j]$

Safety check algorithm:

Let $Work = Available$ and $Finish[i] = false$ ($\forall i$)
Find a process $P_i$ such that $Finish[i] = false$ and $Need[i] \leq Work$
If found, set $Work = Work + Allocation[i]$ and $Finish[i] = true$, then go back to step 2
If $Finish[i] = true$ for all processes, the system is in a safe state

Safe State vs. Unsafe State

Safe state: There exists a safe sequence in which all processes can complete successfully. A safe state guarantees no deadlock.
Unsafe state: No safe sequence exists. An unsafe state does not necessarily mean deadlock, but there is a risk of deadlock.

Memory Management

Contiguous vs. Non-Contiguous Allocation

Allocation Method	Description	Advantages	Disadvantages
Contiguous	Each process occupies a contiguous block of memory	Simple to implement; high access efficiency	External fragmentation
Non-contiguous	A process can be scattered across different memory locations	Reduces fragmentation	Requires additional address mapping mechanisms

Contiguous allocation strategies:

First Fit: Scan from the beginning and use the first free block that is large enough
Best Fit: Find the smallest free block that is large enough (minimizes leftover fragmentation)
Worst Fit: Use the largest free block

Types of Fragmentation

External Fragmentation: The total free memory is sufficient, but it is non-contiguous and cannot be allocated to a process. Can be resolved through compaction.
Internal Fragmentation: The memory allocated to a process exceeds its actual needs; the excess is wasted. Common in paging.

Paging

Paging is a non-contiguous memory allocation scheme:

Physical memory is divided into fixed-size blocks called frames
The logical address space is divided into blocks of the same size called pages
A page table maps logical page numbers to physical frame numbers

Address translation:

A logical address consists of two parts: page number (p) and page offset (d).

\[\text{逻辑地址} = (p, d) \xrightarrow{\text{页表}} (f, d) = \text{物理地址}\]

where $f = PageTable[p]$ is the corresponding physical frame number.

If the page size is $2^n$ bytes and the logical address space is $2^m$ bytes, then:

Page offset = lower $n$ bits
Page number = upper $(m - n)$ bits
Number of page table entries = $2^{m-n}$

TLB (Translation Lookaside Buffer)

The TLB is a high-speed cache for the page table (typically located in the CPU's MMU) that accelerates address translation.

\[\text{有效访问时间（EAT）} = \alpha \times (t_{TLB} + t_{mem}) + (1 - \alpha) \times (t_{TLB} + 2 \times t_{mem})\]

where $\alpha$ is the TLB hit rate, $t_{TLB}$ is the TLB lookup time, and $t_{mem}$ is the time for one memory access.

Typical TLB Parameters

Number of TLB entries: 64 – 1024
Hit rate: typically > 99%
The TLB must be flushed on context switch (or ASIDs can be used to identify processes)

Multi-Level Page Table

When the logical address space is very large, a single-level page table would consume excessive memory. Multi-level page tables split the page table into layers, allocating page table memory only for address ranges actually in use.

Taking a two-level page table as an example:

逻辑地址：| 一级页号 p1 | 二级页号 p2 | 页内偏移 d |
           ↓
      一级页表[p1] → 二级页表基址
                        ↓
                   二级页表[p2] → 物理帧号 f
                                    ↓
                              物理地址 = (f, d)

Advantage: Saves memory (page table entries for unused regions are not allocated)
Disadvantage: Adds one additional memory access per level of the page table

Segmentation

Segmentation divides the logical address space according to the logical structure of the program into several segments, such as the code segment, data segment, stack segment, etc.

Logical address: (segment number s, offset within segment d)

Segment table entry: (base address, limit)
Address translation: If $d < limit$, physical address = $base + d$; otherwise, a segmentation fault occurs

Comparison	Paging	Segmentation
Division basis	Physical division, fixed size	Logical division, variable size
Fragmentation type	Internal fragmentation	External fragmentation
Visible to user?	Transparent to the user	User-visible (divided by logical modules)
Sharing and protection	Page-level granularity (coarse)	Segment-level granularity (convenient for sharing and protection)

Virtual Memory

Virtual memory allows a process to use an address space larger than the available physical memory. The core idea: only load the pages currently needed into physical memory; keep the rest on disk.

Demand Paging

A page is loaded from disk into memory only when it is accessed.

Page Fault: Triggered when the accessed page is not in memory
Handling procedure: trap to kernel → locate page on disk → load into a free frame → update page table → re-execute the faulting instruction

Page fault rate and effective access time:

\[EAT = (1 - p) \times t_{mem} + p \times t_{page\_fault}\]

where $p$ is the page fault rate, and $t_{page\_fault}$ is typically on the order of milliseconds (including disk I/O) — far greater than $t_{mem}$ (nanoseconds).

The Cost of Page Faults

A single page fault takes about 10 ms to handle (disk access), while a normal memory access takes about 100 ns. Thus, a page fault is roughly $10^5$ times slower than a regular memory access. Even a 0.1% page fault rate can cause significant performance degradation.

Page Replacement Algorithms

When physical memory is full and a new page must be loaded, one existing page must be selected for eviction (replacement).

Algorithm	Strategy	Advantages	Disadvantages
OPT (Optimal)	Replace the page that will not be used for the longest time in the future	Lowest page fault rate (theoretically optimal)	Impossible to implement (requires knowledge of future access patterns)
FIFO (First In, First Out)	Replace the page that has been in memory the longest	Simple to implement	Poor performance; suffers from Belady's anomaly
LRU (Least Recently Used)	Replace the page that has not been accessed for the longest time	Performance close to OPT	High implementation overhead (requires recording access times or maintaining a stack)
Clock	Approximate LRU based on a reference bit	Simple to implement with good performance	Only an approximation of LRU

FIFO Example

Page reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1 (3 frames)

引用: 7  0  1  2  0  3  0  4  2  3  0  3  2  1  2  0  1
帧1:  7  7  7  2  2  2  2  4  4  4  0  0  0  0  0  0  0
帧2:     0  0  0  0  3  3  3  2  2  2  2  2  1  1  1  1
帧3:        1  1  1  1  0  0  0  3  3  3  3  3  2  2  2  (修正)
缺页: *  *  *  *     *  *  *  *  *  *        *  *     *

Number of page faults: 15

Belady's Anomaly

The FIFO algorithm can exhibit a counterintuitive phenomenon where allocating more frames actually increases the page fault rate. This is known as Belady's Anomaly. LRU and OPT do not suffer from this issue (they belong to the class of stack algorithms).

LRU Example

Page reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1 (3 frames)

引用: 7  0  1  2  0  3  0  4  2  3  0  3  2  1  2  0  1
帧:  {7}{7,0}{7,0,1}{2,0,1}{2,0,1}{2,0,3}{2,0,3}→置换最久未用
     → 最终缺页次数：12

LRU's page fault count (12) is lower than FIFO's (15).

Clock Algorithm

The Clock algorithm uses a circular linked list and a pointer (clock hand):

Each frame has a reference bit that is set to 1 when the page is accessed
When a replacement is needed, the pointer scans:
- If reference bit = 1, clear it to 0 (give a second chance) and advance the pointer
- If reference bit = 0, select this page for replacement

Thrashing

When the system has too many processes and each is allocated too few frames, page faults occur frequently. Processes spend most of their time performing page replacements (I/O), and CPU utilization actually drops. This phenomenon is called thrashing.

Cause: A process's working set exceeds the number of physical frames allocated to it.

Solutions:

Working Set Model: Track the set of pages each process accesses within a time window $\Delta$ (the working set), and ensure each process is allocated enough frames to hold its working set
Page Fault Frequency (PFF): Monitor each process's page fault rate. If the rate is too high, allocate more frames; if too low, reclaim some frames
Reduce the degree of multiprogramming (decrease the number of concurrent processes)

File System

File System Layer Hierarchy

Layers from top to bottom:

┌─────────────────────────────────┐
│       应用程序（Application）     │
├─────────────────────────────────┤
│      逻辑文件系统（Logical FS）    │  ← 管理元数据、目录结构
├─────────────────────────────────┤
│   文件组织模块（File Organization）│  ← 逻辑块号 → 物理块号
├─────────────────────────────────┤
│     基本文件系统（Basic FS）       │  ← 向设备驱动发出 I/O 请求
├─────────────────────────────────┤
│      I/O 控制（I/O Control）      │  ← 设备驱动程序、中断处理
├─────────────────────────────────┤
│         设备（Devices）           │  ← 物理硬盘/SSD
└─────────────────────────────────┘

Directory Structures

Structure	Description	Characteristics
Single-level directory	All files reside in the same directory	Simple but file names must be unique
Two-level directory	One directory per user	Resolves naming conflicts but inflexible
Tree-structured directory	Hierarchical directory structure	Most common; supports path names (absolute/relative)
Acyclic-graph directory (DAG)	Allows sharing of files/subdirectories (hard links, symbolic links)	Supports file sharing; must handle dangling references

File Allocation Methods

Method	Description	Advantages	Disadvantages
Contiguous allocation	A file occupies contiguous blocks on disk	Fast sequential and random access	External fragmentation; file size is hard to grow dynamically
Linked allocation	Each block contains a pointer to the next block	No external fragmentation; file size can grow dynamically	Only supports sequential access; pointers consume space
Indexed allocation	A dedicated index block stores pointers to all data blocks	Supports direct access; no external fragmentation	The index block itself consumes space

FAT and inode

FAT (File Allocation Table): A variant of linked allocation. All block pointers are stored centrally in a single table (the FAT), enabling random access.
inode (Unix/Linux): An extension of indexed allocation. An inode contains direct pointers (12), a single indirect pointer, a double indirect pointer, and a triple indirect pointer, supporting very large files.

Free Space Management

Method	Description
Bit Map (Bit Vector)	Each block is represented by 1 bit (0 = free, 1 = allocated). Convenient for finding contiguous free blocks
Free List	All free blocks are linked in a linked list. Allocating the first free block is fast, but finding contiguous space is slow
Grouping	Free blocks are grouped; the first free block stores addresses of several other free blocks
Counting	Records the starting address and count of contiguous free blocks

Disk Scheduling Algorithms

The goal of disk scheduling: minimize the total distance of head movement (seek time).

Assume the head is currently at track 53. Request queue: 98, 183, 37, 122, 14, 124, 65, 67. Track range: 0–199.

Algorithm	Strategy	Characteristics
FCFS	Service requests in arrival order	Fair, but large total head movement
SSTF (Shortest Seek Time First)	Select the request closest to the current head position	Small total movement, but may cause starvation
SCAN (Elevator Algorithm)	Head moves in one direction to the end, then reverses	More uniform waiting time
C-SCAN (Circular SCAN)	Head moves in one direction to the end, then jumps back to the start	Even more uniform waiting time

SCAN Example

Head starts at 53, moving toward 0:

磁头路径: 53 → 37 → 14 → 0 → 65 → 67 → 98 → 122 → 124 → 183
总移动距离: 53 + 183 = 236 磁道

C-SCAN Example

Head starts at 53, moving toward 199:

磁头路径: 53 → 65 → 67 → 98 → 122 → 124 → 183 → 199 → 0 → 14 → 37
总移动距离: (199 - 53) + (199 - 0) + 37 = 382 磁道

Modern Disk Scheduling

Modern operating systems typically use more sophisticated schedulers (e.g., Linux's CFQ, Deadline, and BFQ schedulers) that balance fairness, latency, and throughput. Since SSDs have no mechanical seeking, the significance of disk scheduling is greatly reduced.

I/O Management

I/O Methods

I/O Method	Mechanism	Advantages	Disadvantages
Polling (Busy Waiting)	The CPU continuously checks the I/O device's status register	Simple to implement	Wastes CPU time; low efficiency
Interrupt-Driven I/O	The device sends an interrupt signal to the CPU upon I/O completion	CPU can perform other tasks while waiting	Each byte/word requires an interrupt; frequent interrupts incur high overhead
DMA (Direct Memory Access)	The DMA controller transfers data directly between the device and memory; interrupts the CPU only when done	Minimal CPU burden; suitable for bulk data transfers	Requires DMA hardware; competes with the CPU for the bus

轮询:   CPU ──循环检查──> 设备状态
中断:   CPU ──执行其他──> 设备完成 ──中断──> CPU 处理
DMA:    CPU ──启动DMA──> DMA控制器搬运数据 ──完成中断──> CPU

Interrupt Handling Procedure

The device completes I/O and sends an interrupt request to the CPU via the interrupt controller
The CPU responds to the interrupt after finishing the current instruction
The current process context is saved (PC, registers, etc.)
The CPU jumps to the appropriate interrupt service routine (ISR) via the interrupt vector table
The ISR executes (e.g., copying data from the device buffer to the kernel buffer)
The context is restored, and execution resumes at the interrupted program

Buffering Strategies

Buffering is used to reconcile the speed difference between the CPU and I/O devices.

Strategy	Description
No buffering	Data is transferred directly between user space and the device. Least efficient
Single buffer	A single buffer is set up in kernel space. The device writes to the buffer while the CPU processes the previous batch
Double buffer	Two buffers are used alternately. The device writes to one while the CPU reads from the other
Circular buffer	Multiple buffers arranged in a ring queue. Suitable for the producer-consumer model

Spooling

Spooling (Simultaneous Peripheral Operations On-Line) virtualizes exclusive devices (such as printers) into shared devices. Output from multiple processes is first written to a spool area on disk, and the spooling system sends it to the device in order. The most common example: the print queue.

Interrupt Handling

Interrupt Types

Type	Source	Examples
Hardware Interrupt	External devices	Keyboard input, timer, network packet arrival
Software Interrupt (Trap)	Program instructions	System call (`syscall`), breakpoint
Exception	CPU internal	Division by zero, page fault, segmentation fault

Interrupt Handling Procedure

The CPU saves the current context (PC, registers, status word)
Looks up the ISR (Interrupt Service Routine) address via the interrupt vector
Jumps to and executes the ISR
After the ISR completes, restores the context and resumes the original program

Interrupt Priority: Higher-priority interrupts can preempt lower-priority ISRs (nested interrupts).

Interrupts vs. Polling

Method	Advantages	Disadvantages
Interrupts	CPU does not idle-wait; high efficiency	Context switch overhead
Polling	Simple; deterministic latency	Wastes CPU time

Containers and Virtualization

Virtual Machines (VM)

Run multiple guest OSes on physical hardware via a Hypervisor: - Type 1 (bare-metal): VMware ESXi, KVM, Xen - Type 2 (hosted): VirtualBox, VMware Workstation

Containers

Share the host OS kernel, achieving isolation through namespaces and cgroups: - Namespace: Isolates PID, network, filesystem, users - Cgroup: Limits CPU, memory, and I/O usage

Docker

Dockerfile → Image → Container

Image: A read-only template built in layers (Union FS)
Container: A running instance of an image
Docker Compose: Multi-container orchestration

VM vs. Container

Dimension	Virtual Machine	Container
Isolation level	Full (independent kernel)	Process-level (shared kernel)
Startup time	Minutes	Seconds
Resource overhead	Large (GB-scale)	Small (MB-scale)
Security	Stronger	Weaker (shared kernel)

See Version Control and CI/CD for Docker practices in development.

Relations to Other Topics

Computer Architecture Overview provides the hardware background for interrupts, caches, the MMU, and privilege levels that operating systems rely on.
Computer Networks extends the socket abstraction, protocol stack, and communication model from a single machine to multi-machine environments.
Distributed Systems shows how processes, threads, synchronization, and fault handling scale from one OS instance to a cluster.
Version Control and CI/CD shows how containers, process isolation, and deployment automation are applied in engineering practice.

References

Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. Operating System Concepts. Wiley.
Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. Operating Systems: Three Easy Pieces.
Robert Love. Linux Kernel Development. Addison-Wesley.

Operating System

Why It Belongs to Computer Science

Process Management

Process vs. Thread

Process State Transitions

Process Control Block (PCB)

Context Switch

Inter-Process Communication (IPC)

CPU Scheduling

Scheduling Criteria

Main Scheduling Algorithms

Scheduling Algorithm Examples

FCFS Example

SJF Example

SRTF Example

Round Robin Example (time quantum q=4)

Priority Scheduling Example

MLFQ (Multi-Level Feedback Queue)

Synchronization

The Critical Section Problem

Peterson's Solution

Mutex Locks and Semaphores

Classic Synchronization Problems

Producer-Consumer Problem

Readers-Writers Problem

Dining Philosophers Problem

Deadlock

Four Necessary Conditions for Deadlock

Deadlock Handling Strategies

Banker's Algorithm

Memory Management

Contiguous vs. Non-Contiguous Allocation

Paging

TLB (Translation Lookaside Buffer)

Multi-Level Page Table

Segmentation

Virtual Memory

Demand Paging

Page Replacement Algorithms

FIFO Example

LRU Example

Clock Algorithm

Thrashing

File System

File System Layer Hierarchy

Directory Structures

File Allocation Methods

Free Space Management

Disk Scheduling Algorithms

SCAN Example

C-SCAN Example

I/O Management

I/O Methods

Buffering Strategies

Interrupt Handling

Interrupt Types

Interrupt Handling Procedure

Interrupts vs. Polling

Containers and Virtualization

Virtual Machines (VM)

Containers

Docker

VM vs. Container

Relations to Other Topics

References

评论 #