Skip to content

Neural Manifolds and RL Policy

Neural manifolds and the latent space of an RL policy are the same abstraction discovered in two independent fields: high-dimensional neural states evolve on a low-dimensional manifold, and the activations of a high-dimensional policy network similarly encode the task in a low-dimensional representation. This isomorphism enables mathematical dialogue between BCI and RL.

1. Neural Manifold Recap

See Neural Manifolds and Dynamics.

Core facts

  • M1 ~96 neurons → typical task true dimensionality ~10
  • Neural states evolve on a low-dimensional manifold
  • Manifold structure is conserved during learning

2. The Latent Space of RL Policy

Deep RL networks

A typical deep RL policy:

state → encoder (CNN) → latent → policy head → action

  • latent is typically 256/512-dim
  • But the actual information content is far lower — PCA often finds < 32 dims

Structure of the latent space

  • Similar states cluster
  • Similar actions lie along continuous directions
  • Task structure is embedded in the latent space

This is strikingly similar to the M1 neural manifold.

3. Three Levels of Isomorphism

1. Geometry

  • Neural manifold: PCA to 2D / 3D reveals structure
  • RL latent space: likewise — task / action / state separate along certain directions

Sussillo 2015 found that training an RNN on a hand task produced a hidden manifold nearly identical to M1.

2. Dynamics

  • M1: rotations, attractors, preparation-execution separation
  • Meta-RL RNN: the same rotations, attractors

Wang 2018's prefrontal meta-RL work: training an RNN on multi-task RL → hidden dynamics reproduce the prefrontal cortex.

3. Learning

  • Neural plasticity: STDP changes connections → changes dynamics
  • RL gradients: policy updates → change parameters → change activations

Both are optimization of a parameterized dynamical system.

4. The Gallego-Miller-Solla Paradigm

Gallego, Miller, Solla (2017, Neuron; 2020 Nat Rev Neurosci) series:

Key claims

  • Neural manifolds are the objects of computation, not byproducts
  • Manifolds are conserved across individuals and time
  • Manifolds share structure across tasks

Implications for BCI

  • Decoding should be done on the manifold, not on individual neurons
  • Transfer learning is feasible: manifold-level alignment rather than neuron-level matching

5. The Role of CEBRA

See CEBRA and Contrastive Learning.

Aligning neural + behavior

CEBRA uses contrastive learning to co-map neural activity and behavior (or time) into a joint manifold:

  • Consistent dimensionality
  • Aligned geometry
  • Facilitates decoding

Integration with RL

  • CEBRA's latent space = the policy's latent space
  • Training RL in CEBRA space → potentially more efficient than in raw neuron space
  • Little published work in 2024, but actively developing

6. Applications in Embodied Intelligence

BCI controlling robots

  • M1 neural signals → CEBRA → intent latent
  • LLM / RL policy → robot control
  • Key design: align the neural latent with the robot policy latent

Shared representation learning

  • Joint training of BCI + RL
  • Shared intermediate representations
  • End-to-end from neurons to actions

World-model bridging

  • Neural manifold = biological "world-model state"
  • RL policy latent = artificial "world-model state"
  • BCI = the bridge aligning the two

7. Geometric Semantics of the Latent Space

Euclidean vs. manifold

  • Simple methods: assume Euclidean (PCA, linear decoding)
  • Correct methods: manifold geometry (LLE, t-SNE, UMAP, Isomap)
  • CEBRA implicitly performs nonlinear alignment

Geodesic distance

  • Distance between two points on a manifold ≠ Euclidean distance
  • Geodesic = path of neural dynamics
  • Riemannian policy gradient has an analogue in RL

8. Joint BCI × RL Research

Offline RL + BCI

  • Neural activity as observation
  • User intent as action
  • Offline RL trains the decoder, analogous to behavior cloning

Online RL + BCI

Meta-RL for BCI

  • Each user is a task
  • Meta-RL learns a universal prior across users
  • NDT3 across subjects = implicit meta-RL

9. The Human-Like Intelligence Perspective

World model = neural manifold

The Human-Like Intelligence world_model chapter: - World models learn internal dynamics - Consistent with the neural manifold

JEPA = the objective of the neural manifold

Yann LeCun's JEPA idea: - Don't predict pixels; predict abstract representations - Abstract representations = the manifold the brain actually uses

Meta-learning = cross-manifold transfer

Cross-task meta-learning discovers the "manifold structure among tasks" — echoing Gallego-Miller-Solla's cross-individual manifold conservation.

10. Future: From Neuron-Level to Manifold-Level BCI

Traditional BCI

  • Process each neuron separately
  • Decoder = neuron weights

Manifold-level BCI

  • First map data to a manifold
  • Decoding in manifold coordinates
  • Cross-subject transferable

The NDT3 / POYO Direction

Neural foundation models are effectively building a "universal neural manifold": - Pretraining across thousands of subjects - Shared manifold structure - Rapid adaptation to new subjects

11. Logical Chain

  1. Neural manifold + RL policy latent = the same abstraction: low-dimensional dynamics.
  2. Geometry, dynamics, and learning are isomorphic at three levels.
  3. Gallego-Miller-Solla elevate the manifold to a computational object.
  4. CEBRA aligns neural + behavioral manifolds and is the BCI × RL bridge.
  5. In embodied intelligence: aligning M1 manifold + robot policy latent = natural control.
  6. Meta-RL + NDT3 is the direction for cross-subject manifold transfer.
  7. The world model, JEPA, and meta-learning of Human-Like Intelligence all resonate with this view.

References

  • Gallego et al. (2017). Neural manifolds for the control of movement. Neuron.
  • Gallego et al. (2020). Long-term stability of cortical population dynamics underlying consistent behavior. Nat Neurosci.
  • Wang et al. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci.
  • Schneider et al. (2023). Learnable latent embeddings for joint behavioral and neural analysis. Nature.
  • Sussillo et al. (2015). A neural network that finds a naturalistic solution for the production of muscle activity. Nat Neurosci.

评论 #