Neural Foundation Model POYO

POYO (Azabou et al., 2023 NeurIPS) is the first neural foundation model pretrained at scale across datasets and across subjects, and together with NDT3 (2024) inaugurated the foundation-model era for neural BCI. Structurally it parallels NLP's transition from BERT to GPT-3.

1. The "Pretrain-Fine-tune" Paradigm for Neural Data

Lessons from NLP: - BERT/GPT pretrain on massive unsupervised text - Downstream tasks need only small-scale labeled fine-tuning - Performance + generalization + cross-task transfer

The neural-data counterpart: - Pretraining: large amounts of unsupervised spike/LFP data - Fine-tuning: small amounts of labels (gesture, cursor, speech) - Goal: transfer across subjects / tasks / recording modalities

POYO was the first model to make this paradigm actually work.

2. POYO Architecture

Core Design

Input: {(unit_i, time_j, spike_count)} — sparse tokens

① Per-unit Embedding
② Cross-attention (PerceiverIO-style)
   Query: fixed latent bank (e.g., 256)
   Key/Value: input spike tokens
③ Several layers of latent self-attention
④ Task head (swappable)

Key Innovations

Spike-as-token: each spike is one token (unit, time), akin to words
PerceiverIO: fixed latent size, decoupled from input length — supports variable-length data
Rotary position encoding: for the time axis
Per-unit embedding: each neuron gets an independent vector, allowing cross-session alignment

3. Training Data and Scale

POYO-1 (2023): - ~160 hours of electrophysiology - 40+ task types - 27 subjects (mostly monkeys)

POYO+ (2024): - 500+ hours - Multiple animal species + humans - Cross-modal (spike + LFP + behavior)

4. Experimental Results

Zero-Shot Transfer

On unseen subjects, POYO's zero-shot decoding accuracy reaches 65–80% (vs. 40% trained from scratch).

Few-Shot Fine-Tuning

After fine-tuning with just 5 minutes of data from a new subject, POYO beats the baseline trained from scratch on 30 minutes of data.

Cross-Task Transfer

Pretrained on monkey gesture data, transferred to handwriting → outperforms direct training on handwriting data.

5. POYO vs. NDT3

	POYO	NDT3
Year	2023 NeurIPS	2024 NeurIPS
Scale	~160 h	~500 h
Architecture	PerceiverIO	Perceiver + extensions
Tokenization	per-spike	per-unit-bin
Multi-modal	Limited	Complete
Open source	Partial	Released 2024

The two are sister works (Azabou is a shared lead author) — POYO lays the groundwork; NDT3 extends it.

6. Other Neural Foundation Models

BrainBERT (Wang 2023)

An ECoG-specific foundation model using masked prediction.

Neuroformer (Antoniades 2024)

Vision + neural multi-modal pretraining.

EEGPT (Pu 2024)

EEG foundation model with millions of pretraining examples.

LaBraM (Jiang 2024, ICLR)

Large Brain Model, VQ discrete tokenization + Transformer, cross-dataset EEG pretraining.

BFM (Brain Foundation Model, 2024 arXiv Survey)

A survey reviewing 10+ neural foundation-model works from 2023–2024.

7. Why Foundation Models Work on Neural Data

Although neural recordings vary enormously in channel count, subjects, and tasks, they share deep common structure:

Biological similarity: motor and visual cortex are functionally similar across human and monkey brains
Conserved manifold geometry: cross-subject neural manifolds are similarly shaped (Gallego 2020, mentioned in Chapter 02)
Reusable task structure: vision-motor-attention processes share cross-task commonalities
No ceiling on self-supervision: as long as spike data are available, pretraining is possible

These let foundation models learn truly shared computational representations in "dirty data" environments.

8. Downstream Tasks of Foundation Models

POYO / NDT3 foundation models can serve many downstream tasks:

Motor decoding: cursor, robotic arm
Speech decoding: ECoG version
Brain-to-language: linked with LLMs
Cognitive state: fatigue, attention, error monitoring
Stimulus design: inverse — generate stimuli from a target percept

One pretrained model + multiple task heads = a platform BCI system.

9. Scaling Laws for Neural Foundation Models

Preliminary observations (NDT3, POYO+) suggest:

Data doubling → error ~-20% (akin to NLP's Chinchilla law)
Parameter doubling → error ~-15%
10× downstream task data → substantial fine-tune gains

Conclusion: BCI foundation models are still in the early scaling phase, and large-scale pretraining is expected to keep pushing SOTA over the next five years.

10. Open Challenges

Ethics and data sharing: neural data are highly sensitive; cross-institutional aggregation faces privacy hurdles
Electrode heterogeneity: Utah, Neuropixels, and Neuralink output in different formats, and unified tokenization is still being explored
Closed-loop adaptation: how pretrained foundation models can keep learning during user operation
Interpretability: foundation models are typically black boxes, but clinical use demands interpretability
Safety: large models can be attacked or misused — LLM alignment issues apply to BCI as well

11. Connection to Human-Like Intelligence

POYO / NDT3 and JEPA / LLM are philosophically aligned:

JEPA: pretrained visual latent space → world model
NDT3: pretrained neural latent space → "neural foundation model"
LLM: pretrained language representation → general language capability

A shared theme: large-scale self-supervision + task conditioning — trading data for generality. See Chapter 10 Link to Embodied Intelligence.

12. Logical Chain

NLP's pretrain-fine-tune paradigm inspires BCI — but channel heterogeneity must be overcome.
POYO uses PerceiverIO + per-unit embedding to achieve cross-subject, cross-dataset pretraining.
POYO+ / NDT3 scale to 500+ hours, reaching the "neural GPT-3" scale.
Foundation models substantially outperform traditional methods in zero-shot, few-shot, and cross-task settings.
Neural-data scaling laws are still in the early phase — models are expected to keep improving over the next five years.

References

Azabou et al. (2023). A unified, scalable framework for neural population decoding. NeurIPS. https://arxiv.org/abs/2310.16046
Azabou, Ye et al. (2024). Multi-session, multi-task neural decoding from distinct cell-types and brain regions. NeurIPS.
Jiang et al. (2024). Large Brain Model for learning generic representations with tremendous EEG data in BCI. ICLR. https://openreview.net/forum?id=QzTpTRVtrP
Wang et al. (2023). BrainBERT: self-supervised representation learning for intracranial recordings. ICLR.
Brain Foundation Models Survey (2025). arXiv:2503.00580.