Skip to content

Computer Architecture Overview

Computer architecture studies how abstract computation is executed on real machines, and how hardware systems trade off performance, energy, cost, and correctness. Its core objects include the instruction set architecture (ISA), microarchitecture, memory hierarchy, and parallel organization.

It belongs under computer science because the subject is not merely about physical components. It is about how computation is represented, organized, and accelerated by machines. In other words, architecture sits between software and circuits, studying how computation becomes an executable system. That makes it deeply connected to computer engineering while still being a core part of computer science.

Why It Belongs to Computer Science

Perspective What Architecture Studies Why This Is a Computer Science Question
Abstraction ISA, memory model, hierarchical interfaces It studies how programs are interpreted and executed by machines
Performance Pipelines, caches, parallelism, bandwidth It studies how the same computation can be executed faster
Correctness Consistency, exceptions, interrupts, privilege levels It studies whether system behavior matches computational semantics

1. Abstraction Layer Model

The core design philosophy of computer systems is abstraction through layering -- each layer hides the complexity of the layer below and provides a clean interface to the layer above.

graph TB
    A[Application] --> B[Operating System]
    B --> C[Instruction Set Architecture ISA]
    C --> D[Microarchitecture]
    D --> E[Functional Units]
    E --> F[Logic Gates]
    F --> G[Transistors]
    G --> H[Physics]

    style A fill:#e1f5fe
    style C fill:#fff9c4
    style D fill:#f3e5f5
    style H fill:#ffccbc

ISA (Instruction Set Architecture) is the critical contract layer between hardware and software:

  • Software engineers program against the ISA (or generate ISA instructions through compilers)
  • Hardware engineers freely optimize microarchitecture implementations within ISA constraints
  • The same ISA can have multiple microarchitecture implementations (e.g., Intel P6 and Zen series under x86)

2. Von Neumann Architecture vs Harvard Architecture

2.1 Von Neumann Architecture

Core idea: Stored Program -- instructions and data reside in the same memory.

Five major components:

Component Function
ALU (Arithmetic Logic Unit) Performs arithmetic and logical operations
Control Unit Fetches instructions, decodes, generates control signals
Memory Unified storage for instructions and data
Input Device Receives external data
Output Device Outputs computation results

Von Neumann bottleneck: The bus bandwidth between CPU and memory becomes the system performance bottleneck (CPU-Memory bandwidth bottleneck).

2.2 Harvard Architecture

  • Instruction memory and data memory are physically separated
  • Can fetch instructions and data simultaneously -- higher bandwidth
  • Common in DSPs (Digital Signal Processors) and microcontrollers
  • Modern processors adopt a modified Harvard architecture at the L1 Cache level (separate I-Cache and D-Cache, but shared main memory)
graph LR
    subgraph "Von Neumann"
        CPU1[CPU] <-->|Unified bus| MEM1[Unified Memory]
    end

    subgraph "Harvard"
        CPU2[CPU] <-->|Instruction bus| IMEM[Instruction Memory]
        CPU2 <-->|Data bus| DMEM[Data Memory]
    end

3. Key Performance Metrics

3.1 CPU Performance Equation

\[ \text{CPU Time} = \frac{\text{Instructions} \times \text{CPI}}{\text{Clock Rate}} \]

Where:

  • Instructions: Instruction count (IC) of the program, determined by the program, compiler, and ISA
  • CPI (Cycles Per Instruction): Average clock cycles per instruction, determined by microarchitecture
  • Clock Rate: Clock cycles per second, determined by process technology and microarchitecture

IPC and CPI

IPC (Instructions Per Cycle) \(= 1/\text{CPI}\), measuring instructions executed per clock cycle. Modern superscalar processors typically have IPC > 1.

3.2 MIPS and FLOPS

Metric Definition Limitation
MIPS Millions of instructions per second Instructions differ across ISAs; not directly comparable
FLOPS Floating-point operations per second Common in scientific computing; ignores integer and memory performance

3.3 Benchmarks

  • SPEC CPU: Industry standard, divided into integer (SPECint) and floating-point (SPECfp)
  • Geekbench: Cross-platform single-core/multi-core
  • MLPerf: Machine learning workloads

4. Amdahl's Law

Core idea: System speedup is limited by the non-parallelizable portion.

\[ S = \frac{1}{(1 - f) + \frac{f}{N}} \]

Where:

  • \(f\): Fraction that can be accelerated
  • \(N\): Speedup factor for that fraction
  • \(S\): Overall speedup

Numerical Example

If 80% of a program is parallelizable (\(f=0.8\)), using 4 cores (\(N=4\)):

\[S = \frac{1}{0.2 + \frac{0.8}{4}} = \frac{1}{0.2 + 0.2} = 2.5\times\]

Even with infinitely many cores (\(N \to \infty\)), the speedup is bounded by \(1/(1-f) = 5\times\).

Corollaries:

  • Optimizing the bottleneck yields the greatest benefit (Make the common case fast)
  • Diminishing returns from simply adding more cores
  • Gustafson's Law provides a complementary perspective -- as the number of cores increases, the problem size can also grow

5. Power and Design Constraints

5.1 Dynamic Power

\[ P_{\text{dynamic}} = \alpha \cdot C \cdot V_{dd}^2 \cdot f \]
  • \(\alpha\): Activity factor (switching activity)
  • \(C\): Load capacitance
  • \(V_{dd}\): Supply voltage
  • \(f\): Clock frequency

Reducing voltage is the most effective means of lowering power (quadratic relationship), but too low a voltage prevents transistors from switching correctly.

5.2 Power Wall

Dennard Scaling broke down around 2006:

  • Transistors shrink but leakage current no longer scales proportionally
  • Chip power density can no longer be continuously reduced
  • Drove architectural shift toward multi-core and heterogeneous computing

5.3 Dark Silicon

Due to thermal constraints, a large fraction of transistors on a chip cannot operate at full speed simultaneously. Countermeasures:

  • Heterogeneous architectures (CPU + GPU + NPU + DSP)
  • Dynamic Voltage and Frequency Scaling (DVFS)
  • Application-Specific Integrated Circuits (ASICs)
timeline
    title Computer Architecture Evolution
    1940s-1950s : Vacuum tubes : Von Neumann prototypes
    1960s-1970s : Integrated circuits : IBM System/360 : Pipelining
    1980s : RISC revolution : MIPS, SPARC, ARM
    1990s : Superscalar + Out-of-order execution : Pentium Pro
    2000s : Multi-core era : Driven by power wall
    2010s : Heterogeneous computing : GPGPU : Mobile SoC
    2020s : Domain-specific architectures : Chiplet : RISC-V open source
Era Key Driver Key Innovations
Single-core frequency scaling Dennard Scaling Pipelining, out-of-order execution, branch prediction
Multi-core parallelism Power wall Multi-core, SMT, SIMD
Heterogeneous specialization Dark silicon, domain demands GPU, TPU, NPU, FPGA
Open ecosystem Cost and flexibility RISC-V, Chiplet

7. Design Philosophies

  1. Make the common case fast: Optimize the most frequently executed paths
  2. Simplicity favors regularity: Simple and regular designs reduce complexity
  3. Good design demands good compromises: Trade-offs among performance, power, area, and cost
  4. Locality, locality, locality: Temporal and spatial locality are the cornerstones of memory hierarchy design

Further Reading

Relations to Other Topics

  • Instruction Set Architecture continues the discussion of the ISA as the software/hardware contract.
  • Processor Microarchitecture explains where performance actually comes from, including pipelining, out-of-order execution, caches, and branch prediction.
  • Operating Systems shows how interrupts, exceptions, virtual memory, and privilege levels are managed at the software layer.
  • Parallel Computing lifts multi-core, SIMD, and GPU capabilities into programming models and system implementation concerns.

References

  1. John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann.
  2. David A. Patterson and John L. Hennessy. Computer Organization and Design. Morgan Kaufmann.
  3. John L. Hennessy and David A. Patterson. Computer Organization and Design RISC-V Edition. Morgan Kaufmann.

评论 #