Computer Architecture Overview

Computer architecture studies how abstract computation is executed on real machines, and how hardware systems trade off performance, energy, cost, and correctness. Its core objects include the instruction set architecture (ISA), microarchitecture, memory hierarchy, and parallel organization.

It belongs under computer science because the subject is not merely about physical components. It is about how computation is represented, organized, and accelerated by machines. In other words, architecture sits between software and circuits, studying how computation becomes an executable system. That makes it deeply connected to computer engineering while still being a core part of computer science.

Why It Belongs to Computer Science

Perspective	What Architecture Studies	Why This Is a Computer Science Question
Abstraction	ISA, memory model, hierarchical interfaces	It studies how programs are interpreted and executed by machines
Performance	Pipelines, caches, parallelism, bandwidth	It studies how the same computation can be executed faster
Correctness	Consistency, exceptions, interrupts, privilege levels	It studies whether system behavior matches computational semantics

1. Abstraction Layer Model

The core design philosophy of computer systems is abstraction through layering -- each layer hides the complexity of the layer below and provides a clean interface to the layer above.

graph TB
    A[Application] --> B[Operating System]
    B --> C[Instruction Set Architecture ISA]
    C --> D[Microarchitecture]
    D --> E[Functional Units]
    E --> F[Logic Gates]
    F --> G[Transistors]
    G --> H[Physics]

    style A fill:#e1f5fe
    style C fill:#fff9c4
    style D fill:#f3e5f5
    style H fill:#ffccbc

ISA (Instruction Set Architecture) is the critical contract layer between hardware and software:

Software engineers program against the ISA (or generate ISA instructions through compilers)
Hardware engineers freely optimize microarchitecture implementations within ISA constraints
The same ISA can have multiple microarchitecture implementations (e.g., Intel P6 and Zen series under x86)

2. Von Neumann Architecture vs Harvard Architecture

2.1 Von Neumann Architecture

Core idea: Stored Program -- instructions and data reside in the same memory.

Five major components:

Component	Function
ALU (Arithmetic Logic Unit)	Performs arithmetic and logical operations
Control Unit	Fetches instructions, decodes, generates control signals
Memory	Unified storage for instructions and data
Input Device	Receives external data
Output Device	Outputs computation results

Von Neumann bottleneck: The bus bandwidth between CPU and memory becomes the system performance bottleneck (CPU-Memory bandwidth bottleneck).

2.2 Harvard Architecture

Instruction memory and data memory are physically separated
Can fetch instructions and data simultaneously -- higher bandwidth
Common in DSPs (Digital Signal Processors) and microcontrollers
Modern processors adopt a modified Harvard architecture at the L1 Cache level (separate I-Cache and D-Cache, but shared main memory)

graph LR
    subgraph "Von Neumann"
        CPU1[CPU] <-->|Unified bus| MEM1[Unified Memory]
    end

    subgraph "Harvard"
        CPU2[CPU] <-->|Instruction bus| IMEM[Instruction Memory]
        CPU2 <-->|Data bus| DMEM[Data Memory]
    end

3. Key Performance Metrics

3.1 CPU Performance Equation

\[ \text{CPU Time} = \frac{\text{Instructions} \times \text{CPI}}{\text{Clock Rate}} \]

Where:

Instructions: Instruction count (IC) of the program, determined by the program, compiler, and ISA
CPI (Cycles Per Instruction): Average clock cycles per instruction, determined by microarchitecture
Clock Rate: Clock cycles per second, determined by process technology and microarchitecture

IPC and CPI

IPC (Instructions Per Cycle) \(= 1/\text{CPI}\), measuring instructions executed per clock cycle. Modern superscalar processors typically have IPC > 1.

3.2 MIPS and FLOPS

Metric	Definition	Limitation
MIPS	Millions of instructions per second	Instructions differ across ISAs; not directly comparable
FLOPS	Floating-point operations per second	Common in scientific computing; ignores integer and memory performance

3.3 Benchmarks

SPEC CPU: Industry standard, divided into integer (SPECint) and floating-point (SPECfp)
Geekbench: Cross-platform single-core/multi-core
MLPerf: Machine learning workloads

4. Amdahl's Law

Core idea: System speedup is limited by the non-parallelizable portion.

\[ S = \frac{1}{(1 - f) + \frac{f}{N}} \]

Where:

\(f\): Fraction that can be accelerated
\(N\): Speedup factor for that fraction
\(S\): Overall speedup

Numerical Example

If 80% of a program is parallelizable (\(f=0.8\)), using 4 cores (\(N=4\)):

\[S = \frac{1}{0.2 + \frac{0.8}{4}} = \frac{1}{0.2 + 0.2} = 2.5\times\]

Even with infinitely many cores (\(N \to \infty\)), the speedup is bounded by \(1/(1-f) = 5\times\).

Corollaries:

Optimizing the bottleneck yields the greatest benefit (Make the common case fast)
Diminishing returns from simply adding more cores
Gustafson's Law provides a complementary perspective -- as the number of cores increases, the problem size can also grow

5. Power and Design Constraints

5.1 Dynamic Power

\[ P_{\text{dynamic}} = \alpha \cdot C \cdot V_{dd}^2 \cdot f \]

\(\alpha\): Activity factor (switching activity)
\(C\): Load capacitance
\(V_{dd}\): Supply voltage
\(f\): Clock frequency

Reducing voltage is the most effective means of lowering power (quadratic relationship), but too low a voltage prevents transistors from switching correctly.

5.2 Power Wall

Dennard Scaling broke down around 2006:

Transistors shrink but leakage current no longer scales proportionally
Chip power density can no longer be continuously reduced
Drove architectural shift toward multi-core and heterogeneous computing

5.3 Dark Silicon

Due to thermal constraints, a large fraction of transistors on a chip cannot operate at full speed simultaneously. Countermeasures:

Heterogeneous architectures (CPU + GPU + NPU + DSP)
Dynamic Voltage and Frequency Scaling (DVFS)
Application-Specific Integrated Circuits (ASICs)

6. Architecture Evolution Trends

timeline
    title Computer Architecture Evolution
    1940s-1950s : Vacuum tubes : Von Neumann prototypes
    1960s-1970s : Integrated circuits : IBM System/360 : Pipelining
    1980s : RISC revolution : MIPS, SPARC, ARM
    1990s : Superscalar + Out-of-order execution : Pentium Pro
    2000s : Multi-core era : Driven by power wall
    2010s : Heterogeneous computing : GPGPU : Mobile SoC
    2020s : Domain-specific architectures : Chiplet : RISC-V open source

Era	Key Driver	Key Innovations
Single-core frequency scaling	Dennard Scaling	Pipelining, out-of-order execution, branch prediction
Multi-core parallelism	Power wall	Multi-core, SMT, SIMD
Heterogeneous specialization	Dark silicon, domain demands	GPU, TPU, NPU, FPGA
Open ecosystem	Cost and flexibility	RISC-V, Chiplet

7. Design Philosophies

Make the common case fast: Optimize the most frequently executed paths
Simplicity favors regularity: Simple and regular designs reduce complexity
Good design demands good compromises: Trade-offs among performance, power, area, and cost
Locality, locality, locality: Temporal and spatial locality are the cornerstones of memory hierarchy design

Relations to Other Topics

Instruction Set Architecture continues the discussion of the ISA as the software/hardware contract.
Processor Microarchitecture explains where performance actually comes from, including pipelining, out-of-order execution, caches, and branch prediction.
Operating Systems shows how interrupts, exceptions, virtual memory, and privilege levels are managed at the software layer.
Parallel Computing lifts multi-core, SIMD, and GPU capabilities into programming models and system implementation concerns.

References

John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann.
David A. Patterson and John L. Hennessy. Computer Organization and Design. Morgan Kaufmann.
John L. Hennessy and David A. Patterson. Computer Organization and Design RISC-V Edition. Morgan Kaufmann.