Predictive Coding

1. Core Idea: The Brain as a Prediction Machine

The traditional view holds that perception is a bottom-up process: sensory organs receive stimuli, process them layer by layer, and ultimately form a percept. However, Predictive Coding theory proposes a fundamentally different picture:

The brain continuously generates top-down predictions of sensory input, compares these predictions against actual input, and only the prediction error is propagated upward.

In other words, the brain does not passively "receive" the world — it actively "guesses" at the world and continually corrects its guesses.

This theory was first formalized by Rao & Ballard (1999) in the context of the visual cortex, though its intellectual roots trace back to Helmholtz's concept of "unconscious inference" in the 19th century.

2. Hierarchical Structure and Error Propagation

The core mechanism of predictive coding rests on a hierarchical structure. Each level performs two operations:

Sending predictions downward: Higher levels use their internal models to predict the activity patterns of lower levels
Propagating errors upward: Lower levels compare actual activity against received predictions and pass only the difference (residual) to higher levels

This can be understood through a simple formula:

Signal passed upward = Actual input - Prediction from above = Prediction error

This implies:

If the prediction is perfectly correct, the error is zero, and almost no information needs to be passed upward
If the prediction is wrong, the error signal drives the higher level to update its internal model
The higher the level, the more abstract the representation and the longer the temporal horizon of prediction

Level	Representational Content	Prediction Target	Meaning of Error
Low	Edges, textures	Pixel-level features	Local sensory surprise
Mid	Object parts, contours	Combinations of low-level features	Object recognition deviation
High	Scenes, semantics	Mid-level object configurations	Scene comprehension error

3. Computational Efficiency: Transmitting Only "Surprises"

This architecture offers a profound computational advantage: information compression.

The volume of sensory information the brain receives every second is enormous — the retina alone transmits roughly 10 megabits per second. If every level had to fully process all raw data, the cost would be prohibitive.

Predictive coding's solution is:

The brain primarily processes not "what happened" but "what happened that was unexpected."

This aligns closely with a core principle of information theory — information content equals the degree of surprise. A fully predictable signal carries no new information; only deviations from expectations are worth attending to.

An everyday example: walking down a familiar street, you barely notice anything — because everything matches your expectations. But if a peacock suddenly appears on the sidewalk, you notice it immediately — because it constitutes a massive prediction error.

4. Perception as Active Inference

Predictive coding offers a radical reframing of perception:

Perception is not passive reception but active inference — the brain is essentially "hallucinating" the entire world, then using sensory data to constrain and correct those hallucinations.

This sounds extreme, but there is a concise argument for it:

Our conscious experience is rich, continuous, and complete (we don't perceive visual interruptions even when blinking)
Sensory input is sparse, noisy, and delayed
Therefore, most of what we "perceive" is actually filled in by the brain — the output of the brain's generative model

Dreams can be seen as an extreme case of predictive coding: when sensory input is completely cut off, the brain's generative model continues to run, producing a full subjective experience — only without error signals to correct it.

Hallucinations, in turn, can be understood as cases where the generative model becomes overactive, overwhelming the error correction signals from the senses.

5. Biological Evidence

Predictive coding is not merely an elegant theory — it is supported by solid neuroscientific evidence:

Feedback Connections Outnumber Feedforward Connections

In the cerebral cortex, feedback connections from higher to lower areas far outnumber feedforward connections from lower to higher areas. If the brain simply processed information bottom-up, these abundant feedback connections would serve no purpose. Under the predictive coding framework, however, feedback connections are precisely the channels that carry top-down predictions.

Cortical Column Structure

The basic functional unit of the cerebral cortex is the cortical column, and each cortical column contains multiple layers of neurons. Research suggests that different layers within a cortical column may separately encode prediction signals and error signals:

Deep layers (layers 5–6): Encode predictions, projecting downward
Superficial layers (layers 2–3): Encode prediction errors, projecting upward

Repetition Suppression and Expectation Suppression

When the same stimulus is presented repeatedly, the neural response diminishes — a phenomenon known as repetition suppression. Predictive coding explains this as follows: repeated stimuli are more predictable, so prediction error is smaller, and neural activity naturally decreases.

6. Cutting-Edge Developments (2025)

Predictive Coding Induces Brain-Like Responses

Research by Gutlin & Auksztulewicz (2025, PLOS Complex Systems) found that artificial neural networks trained with predictive coding algorithms reproduce brain neural response patterns more faithfully than networks trained with supervised learning.

The significance of this work lies in the following:

It provides quantifiable evidence that predictive coding is not merely a qualitative description of the brain — at an engineering level, it genuinely produces more "brain-like" computational behavior.

Predictive Coding Light

The Predictive Coding Light project, published in Nature Communications in 2025, constructed a recurrent hierarchical spiking neural network that, using only biologically plausible spike-timing dependent plasticity (STDP) learning rules, successfully reproduced multiple processing characteristics of the visual cortex.

The key breakthroughs of this work include:

No reliance on backpropagation
Use of real spiking neurons rather than artificial neurons
Fully local learning rules — each synapse only needs to know the firing times of its pre- and post-synaptic neurons
Despite these constraints, the network spontaneously formed a hierarchical predictive coding structure

7. Relationship to Backpropagation

Predictive coding has an intriguing relationship with backpropagation in deep learning.

Backpropagation is currently the most effective method for training deep networks, but it faces several serious biological implausibility issues:

Issue	Backpropagation	Predictive Coding
Weight symmetry	Requires shared weights between forward and backward paths	Not required; feedforward and feedback pathways can be independent
Global error signal	Must be propagated back from the output layer	Error signals are local
Two-phase training	Alternates between forward pass and backward pass	Can update continuously and online
Biological plausibility	Low	High

It has been theoretically proven that, under certain conditions, the learning dynamics of predictive coding are equivalent to backpropagation (Whittington & Bogacz, 2017; Millidge et al., 2022). However, in practice:

Predictive coding cannot yet match the performance of backpropagation on large-scale tasks. A significant gap remains between biological plausibility and engineering efficiency.

This does not mean predictive coding lacks value. It may point toward a more general and flexible learning mechanism — we simply have not yet found the right way to implement it.

8. Relationship to the Free Energy Principle

Predictive coding can be viewed as a special case of the Free Energy Principle.

Karl Friston's Free Energy Principle posits that all biological systems seek to minimize "variational free energy." At the level of perception, minimizing free energy is equivalent to minimizing prediction error — which is precisely what predictive coding does.

More specifically:

Free energy ≈ Prediction error + Model complexity

Predictive coding reduces prediction error by updating internal models, corresponding to the first term of the free energy formula. The Free Energy Principle also includes a second term — a model complexity penalty — implying that the brain tends to explain the world using the simplest possible model.

This hierarchical relationship can be summarized as:

Free Energy Principle: The most general theoretical framework, encompassing both perception and action
Predictive Coding: The specific implementation of free energy minimization at the level of perception
Active Inference: The specific implementation of free energy minimization at the level of action

9. Why Predictive Coding Matters

Predictive coding is not just a neuroscience theory — it holds profound implications for artificial intelligence:

Efficiency: Processing only "surprises" rather than all data represents a fundamental information compression strategy
Biological plausibility: It offers a learning pathway that does not depend on backpropagation, potentially more suitable for implementation on neuromorphic hardware
Unifying power: Seemingly disparate phenomena — perception, learning, attention, hallucinations, dreams — can all receive unified explanations under the predictive coding framework
Natural connection to generative models: Predictive coding is essentially the brain running a generative model, which aligns remarkably well with the generative modeling approach in modern AI

Current deep learning systems differ from predictive coding in fundamental architectural ways: they are primarily feedforward, passive, and trained with backpropagation. If future AI systems are to become more efficient and brain-like, predictive coding offers a direction well worth serious exploration.

10. Summary

The brain is not a passive information processor but an active prediction machine. It continuously guesses at the world, corrects its guesses using sensory data, and attends only to surprises. This is the core insight of predictive coding.

The complete logical chain:

The brain generates top-down predictions and propagates bottom-up errors
This mechanism achieves extremely high computational efficiency — processing only "surprises"
Perception is therefore not passive reception but active inference
Biological evidence (feedback connections, cortical column structure) supports this theory
Cutting-edge work in 2025 demonstrates that predictive coding algorithms genuinely produce brain-like computational behavior
Predictive coding offers a biologically plausible alternative to backpropagation, though an engineering performance gap remains
Predictive coding is the specific implementation of the Free Energy Principle at the level of perception