Neural Manifolds and RL Policy

Neural manifolds and the latent space of an RL policy are the same abstraction discovered in two independent fields: high-dimensional neural states evolve on a low-dimensional manifold, and the activations of a high-dimensional policy network similarly encode the task in a low-dimensional representation. This isomorphism enables mathematical dialogue between BCI and RL.

1. Neural Manifold Recap

See Neural Manifolds and Dynamics.

Core facts

M1 ~96 neurons → typical task true dimensionality ~10
Neural states evolve on a low-dimensional manifold
Manifold structure is conserved during learning

2. The Latent Space of RL Policy

Deep RL networks

A typical deep RL policy:

state → encoder (CNN) → latent → policy head → action

latent is typically 256/512-dim
But the actual information content is far lower — PCA often finds < 32 dims

Structure of the latent space

Similar states cluster
Similar actions lie along continuous directions
Task structure is embedded in the latent space

This is strikingly similar to the M1 neural manifold.

3. Three Levels of Isomorphism

1. Geometry

Neural manifold: PCA to 2D / 3D reveals structure
RL latent space: likewise — task / action / state separate along certain directions

Sussillo 2015 found that training an RNN on a hand task produced a hidden manifold nearly identical to M1.

2. Dynamics

M1: rotations, attractors, preparation-execution separation
Meta-RL RNN: the same rotations, attractors

Wang 2018's prefrontal meta-RL work: training an RNN on multi-task RL → hidden dynamics reproduce the prefrontal cortex.

3. Learning

Neural plasticity: STDP changes connections → changes dynamics
RL gradients: policy updates → change parameters → change activations

Both are optimization of a parameterized dynamical system.

4. The Gallego-Miller-Solla Paradigm

Gallego, Miller, Solla (2017, Neuron; 2020 Nat Rev Neurosci) series:

Key claims

Neural manifolds are the objects of computation, not byproducts
Manifolds are conserved across individuals and time
Manifolds share structure across tasks

Implications for BCI

Decoding should be done on the manifold, not on individual neurons
Transfer learning is feasible: manifold-level alignment rather than neuron-level matching

5. The Role of CEBRA

See CEBRA and Contrastive Learning.

Aligning neural + behavior

CEBRA uses contrastive learning to co-map neural activity and behavior (or time) into a joint manifold:

Consistent dimensionality
Aligned geometry
Facilitates decoding

Integration with RL

CEBRA's latent space = the policy's latent space
Training RL in CEBRA space → potentially more efficient than in raw neuron space
Little published work in 2024, but actively developing

6. Applications in Embodied Intelligence

BCI controlling robots

M1 neural signals → CEBRA → intent latent
LLM / RL policy → robot control
Key design: align the neural latent with the robot policy latent

Shared representation learning

Joint training of BCI + RL
Shared intermediate representations
End-to-end from neurons to actions

World-model bridging

Neural manifold = biological "world-model state"
RL policy latent = artificial "world-model state"
BCI = the bridge aligning the two

7. Geometric Semantics of the Latent Space

Euclidean vs. manifold

Simple methods: assume Euclidean (PCA, linear decoding)
Correct methods: manifold geometry (LLE, t-SNE, UMAP, Isomap)
CEBRA implicitly performs nonlinear alignment

Geodesic distance

Distance between two points on a manifold ≠ Euclidean distance
Geodesic = path of neural dynamics
Riemannian policy gradient has an analogue in RL

8. Joint BCI × RL Research

Offline RL + BCI

Neural activity as observation
User intent as action
Offline RL trains the decoder, analogous to behavior cloning

Online RL + BCI

Co-adaptation is an RL problem
SmoothBatch (ReFIT and Online Calibration) = policy gradient
System + user = multi-agent RL

Meta-RL for BCI

Each user is a task
Meta-RL learns a universal prior across users
NDT3 across subjects = implicit meta-RL

9. The Human-Like Intelligence Perspective

World model = neural manifold

The Human-Like Intelligence world_model chapter: - World models learn internal dynamics - Consistent with the neural manifold

JEPA = the objective of the neural manifold

Yann LeCun's JEPA idea: - Don't predict pixels; predict abstract representations - Abstract representations = the manifold the brain actually uses

Meta-learning = cross-manifold transfer

Cross-task meta-learning discovers the "manifold structure among tasks" — echoing Gallego-Miller-Solla's cross-individual manifold conservation.

10. Future: From Neuron-Level to Manifold-Level BCI

Traditional BCI

Process each neuron separately
Decoder = neuron weights

Manifold-level BCI

First map data to a manifold
Decoding in manifold coordinates
Cross-subject transferable

The NDT3 / POYO Direction

Neural foundation models are effectively building a "universal neural manifold": - Pretraining across thousands of subjects - Shared manifold structure - Rapid adaptation to new subjects

11. Logical Chain

Neural manifold + RL policy latent = the same abstraction: low-dimensional dynamics.
Geometry, dynamics, and learning are isomorphic at three levels.
Gallego-Miller-Solla elevate the manifold to a computational object.
CEBRA aligns neural + behavioral manifolds and is the BCI × RL bridge.
In embodied intelligence: aligning M1 manifold + robot policy latent = natural control.
Meta-RL + NDT3 is the direction for cross-subject manifold transfer.
The world model, JEPA, and meta-learning of Human-Like Intelligence all resonate with this view.

References

Gallego et al. (2017). Neural manifolds for the control of movement. Neuron.
Gallego et al. (2020). Long-term stability of cortical population dynamics underlying consistent behavior. Nat Neurosci.
Wang et al. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci.
Schneider et al. (2023). Learnable latent embeddings for joint behavioral and neural analysis. Nature.
Sussillo et al. (2015). A neural network that finds a naturalistic solution for the production of muscle activity. Nat Neurosci.