Neural Manifolds and RL Policy
Neural manifolds and the latent space of an RL policy are the same abstraction discovered in two independent fields: high-dimensional neural states evolve on a low-dimensional manifold, and the activations of a high-dimensional policy network similarly encode the task in a low-dimensional representation. This isomorphism enables mathematical dialogue between BCI and RL.
1. Neural Manifold Recap
See Neural Manifolds and Dynamics.
Core facts
- M1 ~96 neurons → typical task true dimensionality ~10
- Neural states evolve on a low-dimensional manifold
- Manifold structure is conserved during learning
2. The Latent Space of RL Policy
Deep RL networks
A typical deep RL policy:
state → encoder (CNN) → latent → policy head → action
- latent is typically 256/512-dim
- But the actual information content is far lower — PCA often finds < 32 dims
Structure of the latent space
- Similar states cluster
- Similar actions lie along continuous directions
- Task structure is embedded in the latent space
This is strikingly similar to the M1 neural manifold.
3. Three Levels of Isomorphism
1. Geometry
- Neural manifold: PCA to 2D / 3D reveals structure
- RL latent space: likewise — task / action / state separate along certain directions
Sussillo 2015 found that training an RNN on a hand task produced a hidden manifold nearly identical to M1.
2. Dynamics
- M1: rotations, attractors, preparation-execution separation
- Meta-RL RNN: the same rotations, attractors
Wang 2018's prefrontal meta-RL work: training an RNN on multi-task RL → hidden dynamics reproduce the prefrontal cortex.
3. Learning
- Neural plasticity: STDP changes connections → changes dynamics
- RL gradients: policy updates → change parameters → change activations
Both are optimization of a parameterized dynamical system.
4. The Gallego-Miller-Solla Paradigm
Gallego, Miller, Solla (2017, Neuron; 2020 Nat Rev Neurosci) series:
Key claims
- Neural manifolds are the objects of computation, not byproducts
- Manifolds are conserved across individuals and time
- Manifolds share structure across tasks
Implications for BCI
- Decoding should be done on the manifold, not on individual neurons
- Transfer learning is feasible: manifold-level alignment rather than neuron-level matching
5. The Role of CEBRA
See CEBRA and Contrastive Learning.
Aligning neural + behavior
CEBRA uses contrastive learning to co-map neural activity and behavior (or time) into a joint manifold:
- Consistent dimensionality
- Aligned geometry
- Facilitates decoding
Integration with RL
- CEBRA's latent space = the policy's latent space
- Training RL in CEBRA space → potentially more efficient than in raw neuron space
- Little published work in 2024, but actively developing
6. Applications in Embodied Intelligence
BCI controlling robots
- M1 neural signals → CEBRA → intent latent
- LLM / RL policy → robot control
- Key design: align the neural latent with the robot policy latent
Shared representation learning
- Joint training of BCI + RL
- Shared intermediate representations
- End-to-end from neurons to actions
World-model bridging
- Neural manifold = biological "world-model state"
- RL policy latent = artificial "world-model state"
- BCI = the bridge aligning the two
7. Geometric Semantics of the Latent Space
Euclidean vs. manifold
- Simple methods: assume Euclidean (PCA, linear decoding)
- Correct methods: manifold geometry (LLE, t-SNE, UMAP, Isomap)
- CEBRA implicitly performs nonlinear alignment
Geodesic distance
- Distance between two points on a manifold ≠ Euclidean distance
- Geodesic = path of neural dynamics
- Riemannian policy gradient has an analogue in RL
8. Joint BCI × RL Research
Offline RL + BCI
- Neural activity as observation
- User intent as action
- Offline RL trains the decoder, analogous to behavior cloning
Online RL + BCI
- Co-adaptation is an RL problem
- SmoothBatch (ReFIT and Online Calibration) = policy gradient
- System + user = multi-agent RL
Meta-RL for BCI
- Each user is a task
- Meta-RL learns a universal prior across users
- NDT3 across subjects = implicit meta-RL
9. The Human-Like Intelligence Perspective
World model = neural manifold
The Human-Like Intelligence world_model chapter: - World models learn internal dynamics - Consistent with the neural manifold
JEPA = the objective of the neural manifold
Yann LeCun's JEPA idea: - Don't predict pixels; predict abstract representations - Abstract representations = the manifold the brain actually uses
Meta-learning = cross-manifold transfer
Cross-task meta-learning discovers the "manifold structure among tasks" — echoing Gallego-Miller-Solla's cross-individual manifold conservation.
10. Future: From Neuron-Level to Manifold-Level BCI
Traditional BCI
- Process each neuron separately
- Decoder = neuron weights
Manifold-level BCI
- First map data to a manifold
- Decoding in manifold coordinates
- Cross-subject transferable
The NDT3 / POYO Direction
Neural foundation models are effectively building a "universal neural manifold": - Pretraining across thousands of subjects - Shared manifold structure - Rapid adaptation to new subjects
11. Logical Chain
- Neural manifold + RL policy latent = the same abstraction: low-dimensional dynamics.
- Geometry, dynamics, and learning are isomorphic at three levels.
- Gallego-Miller-Solla elevate the manifold to a computational object.
- CEBRA aligns neural + behavioral manifolds and is the BCI × RL bridge.
- In embodied intelligence: aligning M1 manifold + robot policy latent = natural control.
- Meta-RL + NDT3 is the direction for cross-subject manifold transfer.
- The world model, JEPA, and meta-learning of Human-Like Intelligence all resonate with this view.
References
- Gallego et al. (2017). Neural manifolds for the control of movement. Neuron.
- Gallego et al. (2020). Long-term stability of cortical population dynamics underlying consistent behavior. Nat Neurosci.
- Wang et al. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci.
- Schneider et al. (2023). Learnable latent embeddings for joint behavioral and neural analysis. Nature.
- Sussillo et al. (2015). A neural network that finds a naturalistic solution for the production of muscle activity. Nat Neurosci.