Computer Architecture Overview
Computer architecture studies how abstract computation is executed on real machines, and how hardware systems trade off performance, energy, cost, and correctness. Its core objects include the instruction set architecture (ISA), microarchitecture, memory hierarchy, and parallel organization.
It belongs under computer science because the subject is not merely about physical components. It is about how computation is represented, organized, and accelerated by machines. In other words, architecture sits between software and circuits, studying how computation becomes an executable system. That makes it deeply connected to computer engineering while still being a core part of computer science.
Why It Belongs to Computer Science
| Perspective | What Architecture Studies | Why This Is a Computer Science Question |
|---|---|---|
| Abstraction | ISA, memory model, hierarchical interfaces | It studies how programs are interpreted and executed by machines |
| Performance | Pipelines, caches, parallelism, bandwidth | It studies how the same computation can be executed faster |
| Correctness | Consistency, exceptions, interrupts, privilege levels | It studies whether system behavior matches computational semantics |
1. Abstraction Layer Model
The core design philosophy of computer systems is abstraction through layering -- each layer hides the complexity of the layer below and provides a clean interface to the layer above.
graph TB
A[Application] --> B[Operating System]
B --> C[Instruction Set Architecture ISA]
C --> D[Microarchitecture]
D --> E[Functional Units]
E --> F[Logic Gates]
F --> G[Transistors]
G --> H[Physics]
style A fill:#e1f5fe
style C fill:#fff9c4
style D fill:#f3e5f5
style H fill:#ffccbc
ISA (Instruction Set Architecture) is the critical contract layer between hardware and software:
- Software engineers program against the ISA (or generate ISA instructions through compilers)
- Hardware engineers freely optimize microarchitecture implementations within ISA constraints
- The same ISA can have multiple microarchitecture implementations (e.g., Intel P6 and Zen series under x86)
2. Von Neumann Architecture vs Harvard Architecture
2.1 Von Neumann Architecture
Core idea: Stored Program -- instructions and data reside in the same memory.
Five major components:
| Component | Function |
|---|---|
| ALU (Arithmetic Logic Unit) | Performs arithmetic and logical operations |
| Control Unit | Fetches instructions, decodes, generates control signals |
| Memory | Unified storage for instructions and data |
| Input Device | Receives external data |
| Output Device | Outputs computation results |
Von Neumann bottleneck: The bus bandwidth between CPU and memory becomes the system performance bottleneck (CPU-Memory bandwidth bottleneck).
2.2 Harvard Architecture
- Instruction memory and data memory are physically separated
- Can fetch instructions and data simultaneously -- higher bandwidth
- Common in DSPs (Digital Signal Processors) and microcontrollers
- Modern processors adopt a modified Harvard architecture at the L1 Cache level (separate I-Cache and D-Cache, but shared main memory)
graph LR
subgraph "Von Neumann"
CPU1[CPU] <-->|Unified bus| MEM1[Unified Memory]
end
subgraph "Harvard"
CPU2[CPU] <-->|Instruction bus| IMEM[Instruction Memory]
CPU2 <-->|Data bus| DMEM[Data Memory]
end
3. Key Performance Metrics
3.1 CPU Performance Equation
Where:
- Instructions: Instruction count (IC) of the program, determined by the program, compiler, and ISA
- CPI (Cycles Per Instruction): Average clock cycles per instruction, determined by microarchitecture
- Clock Rate: Clock cycles per second, determined by process technology and microarchitecture
IPC and CPI
IPC (Instructions Per Cycle) \(= 1/\text{CPI}\), measuring instructions executed per clock cycle. Modern superscalar processors typically have IPC > 1.
3.2 MIPS and FLOPS
| Metric | Definition | Limitation |
|---|---|---|
| MIPS | Millions of instructions per second | Instructions differ across ISAs; not directly comparable |
| FLOPS | Floating-point operations per second | Common in scientific computing; ignores integer and memory performance |
3.3 Benchmarks
- SPEC CPU: Industry standard, divided into integer (SPECint) and floating-point (SPECfp)
- Geekbench: Cross-platform single-core/multi-core
- MLPerf: Machine learning workloads
4. Amdahl's Law
Core idea: System speedup is limited by the non-parallelizable portion.
Where:
- \(f\): Fraction that can be accelerated
- \(N\): Speedup factor for that fraction
- \(S\): Overall speedup
Numerical Example
If 80% of a program is parallelizable (\(f=0.8\)), using 4 cores (\(N=4\)):
Even with infinitely many cores (\(N \to \infty\)), the speedup is bounded by \(1/(1-f) = 5\times\).
Corollaries:
- Optimizing the bottleneck yields the greatest benefit (Make the common case fast)
- Diminishing returns from simply adding more cores
- Gustafson's Law provides a complementary perspective -- as the number of cores increases, the problem size can also grow
5. Power and Design Constraints
5.1 Dynamic Power
- \(\alpha\): Activity factor (switching activity)
- \(C\): Load capacitance
- \(V_{dd}\): Supply voltage
- \(f\): Clock frequency
Reducing voltage is the most effective means of lowering power (quadratic relationship), but too low a voltage prevents transistors from switching correctly.
5.2 Power Wall
Dennard Scaling broke down around 2006:
- Transistors shrink but leakage current no longer scales proportionally
- Chip power density can no longer be continuously reduced
- Drove architectural shift toward multi-core and heterogeneous computing
5.3 Dark Silicon
Due to thermal constraints, a large fraction of transistors on a chip cannot operate at full speed simultaneously. Countermeasures:
- Heterogeneous architectures (CPU + GPU + NPU + DSP)
- Dynamic Voltage and Frequency Scaling (DVFS)
- Application-Specific Integrated Circuits (ASICs)
6. Architecture Evolution Trends
timeline
title Computer Architecture Evolution
1940s-1950s : Vacuum tubes : Von Neumann prototypes
1960s-1970s : Integrated circuits : IBM System/360 : Pipelining
1980s : RISC revolution : MIPS, SPARC, ARM
1990s : Superscalar + Out-of-order execution : Pentium Pro
2000s : Multi-core era : Driven by power wall
2010s : Heterogeneous computing : GPGPU : Mobile SoC
2020s : Domain-specific architectures : Chiplet : RISC-V open source
| Era | Key Driver | Key Innovations |
|---|---|---|
| Single-core frequency scaling | Dennard Scaling | Pipelining, out-of-order execution, branch prediction |
| Multi-core parallelism | Power wall | Multi-core, SMT, SIMD |
| Heterogeneous specialization | Dark silicon, domain demands | GPU, TPU, NPU, FPGA |
| Open ecosystem | Cost and flexibility | RISC-V, Chiplet |
7. Design Philosophies
- Make the common case fast: Optimize the most frequently executed paths
- Simplicity favors regularity: Simple and regular designs reduce complexity
- Good design demands good compromises: Trade-offs among performance, power, area, and cost
- Locality, locality, locality: Temporal and spatial locality are the cornerstones of memory hierarchy design
Further Reading
- Patterson & Hennessy, Computer Organization and Design
- Hennessy & Patterson, Computer Architecture: A Quantitative Approach
- Subsequent chapters: Instruction Set Architecture -> Processor Microarchitecture -> Memory Hierarchy Design -> Parallel Architecture
Relations to Other Topics
- Instruction Set Architecture continues the discussion of the ISA as the software/hardware contract.
- Processor Microarchitecture explains where performance actually comes from, including pipelining, out-of-order execution, caches, and branch prediction.
- Operating Systems shows how interrupts, exceptions, virtual memory, and privilege levels are managed at the software layer.
- Parallel Computing lifts multi-core, SIMD, and GPU capabilities into programming models and system implementation concerns.
References
- John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann.
- David A. Patterson and John L. Hennessy. Computer Organization and Design. Morgan Kaufmann.
- John L. Hennessy and David A. Patterson. Computer Organization and Design RISC-V Edition. Morgan Kaufmann.