Hierarchical Planning: BCI + LLM + Robot
BCI + LLM + robot (BLR for short) is one of the most active AI frontiers of 2024–2026. It takes the low-bandwidth intent extracted by a BCI, has an LLM expand it into a structured plan, and then lets a robot execute it — forming a complete Intention-to-Action pipeline.
1. Architecture Overview
┌─────────────────────────────────────────────────────┐
│ Brain │
│ ├ PPC/PFC: high-level intent ("go to kitchen, get water") │
│ └ M1: low-level kinematics │
└─────────────────┬───────────────────────────────────┘
↓ neural signals
┌─────────────────────────────────────────────────────┐
│ BCI decoder │
│ ├ NDT3/CEBRA: neural → embedding │
│ ├ Speech BCI: neural → words │
│ └ Intent classifier: → structured intent │
└─────────────────┬───────────────────────────────────┘
↓ natural language / structured intent
┌─────────────────────────────────────────────────────┐
│ LLM planner │
│ ├ parse the intent │
│ ├ decompose into sub-goals │
│ ├ generate an action sequence │
│ └ error recovery / dialogue clarification │
└─────────────────┬───────────────────────────────────┘
↓ action sequence (ROS2 / PDDL)
┌─────────────────────────────────────────────────────┐
│ robot execution │
│ ├ motion planning (MoveIt, RRT*) │
│ ├ visual perception (SAM, CLIP) │
│ └ control (PID, MPC) │
└─────────────────────────────────────────────────────┘
Each layer handles a different granularity of abstraction — this is the essence of hierarchical planning.
2. Why We Need an LLM Layer
Without the LLM
Every single action must be specified in detail via BCI:
- "forward 10 cm" "grasp" "lift" "left 30 cm" ...
- BCI bandwidth is insufficient and the experience is terrible
With the LLM
The user says "give me a glass of water", and the LLM produces a 20+ step action sequence:
- identify "water" = the water bottle in the kitchen
- plan motion to the kitchen
- grasp the bottle
- return to the user
- pour water into the user's cup
- deliver it to the user's mouth
The BCI only has to carry semantic-level intent.
Core capabilities provided by the LLM
- common-sense reasoning: "water" lives in the kitchen, the coffee machine produces coffee
- language understanding: vague expressions ("I'm thirsty")
- error recovery: when the robot reports "there is no water in the kitchen", the LLM suggests an alternative
- multi-turn dialogue: the LLM adapts when the user corrects it
3. Representative Systems
HiCRISP (2023)
Chen et al. introduced HiCRISP (Hierarchical Closed-loop Robotic Intelligent Self-correction Planner):
- the LLM generates task-level plans
- closed-loop monitoring + self-correction
- demonstrated in BCI + robot scenarios
PaLM-E (Google 2023)
A multimodal LLM unifying vision + language + action.
- input: image + user instruction
- output: robot action sequence
- combined with a BCI language interface it becomes a brain-controlled PaLM-E
RT-2 (Google 2023)
Vision-Language-Action (VLA) model:
- treats robot actions as language tokens
- emits motion commands directly from the LLM
- a BCI can feed in as a "text prompt generator"
Voyager (Wang 2023)
LLM as a long-horizon planning agent:
- skill discovery, skill library, self-reflection
- originally designed for Minecraft, but it provides a template for BCI assistance
4. BCI-LLM Interface Design
Interface 1: natural language
BCI → speech/handwriting → LLM
Pros: LLMs accept it natively. Cons: low bandwidth, ~60 words per minute.
Applicable: Willett 2023 speech BCI + GPT-4.
Interface 2: structured intent
BCI → JSON / slot filling → LLM
{"action": "fetch", "object": "water", "target": "me"}
Pros: short and high-certainty. Cons: the intent vocabulary is limited.
Interface 3: neural embedding
BCI → latent-space vector → LLM (as a soft prompt)
Pros: retains full neural information. Cons: requires trained alignment. Frontier: NeuroLM (2024) attempts to train neural-language alignment directly.
5. Challenges of Putting LLMs in the Loop
Latency
LLM inference takes 500 ms – 2 s, too slow for real-time interaction. Solution: edge LLMs (Llama-3 / Phi) + cloud GPT hybrid.
Hallucinations
The LLM may invent actions or misplace objects. Solutions:
- Grounding: the LLM may only call skills the robot already possesses
- visual verification: use CLIP to confirm the object exists before execution
- user confirmation: require BCI confirmation for critical steps
Safety
The LLM can be coaxed (or attacked) into issuing dangerous actions. Solution: Constitutional AI-style rule constraints.
6. Training Strategies
SFT: supervised fine-tuning
- collect (BCI intent, LLM plan, robot result) triples
- fine-tune the LLM so it understands BCI scenarios better
RLHF: reinforcement learning from human feedback
- users rate how good each plan is
- PPO optimises the LLM for user preference
In-context prompting
- give the LLM the current environment + skill-library description
- zero-shot / few-shot planning
- well suited for rapid iteration
7. Open-Source Tooling
| Tool | Layer | Function |
|---|---|---|
| MNE / Kilosort | BCI decoding | preprocessing |
| NDT3 / CEBRA | BCI decoding | latent space |
| LangChain | LLM | planning, tool calling |
| Voyager / CoT-Robotics | LLM | skill learning |
| ROS2 | robot | communication |
| MoveIt | robot | motion planning |
| SAM / CLIP | vision | object recognition |
8. Regulation and Deployment
Regulating a BLR system is complex:
- BCI layer: FDA / NMPA medical device
- LLM layer: EU AI Act high-risk AI
- robot layer: ISO 10218 (industrial), ISO 13482 (service)
Compliance path: separate certification at each layer + whole-system certification. A fully commercial BLR system is not expected before 2027–2030.
9. Correspondence with Human-Like Intelligence
The BLR pipeline mirrors the Human_Like_Intelligence / world_model / JEPA line of thinking:
| Human-like intelligence | BLR |
|---|---|
| predictive coding (sensation → internal state) | BCI decoding |
| world model (internal state → action) | LLM planning |
| motor control (action → output) | robot execution |
| environmental feedback | vision / haptics loop |
BLR is not a simulation of AGI, but it is an engineering model of "read real biological intelligence + inject artificial intelligence" — this complementary structure is the root of where BCI research and human-like-intelligence research converge.
10. Landmark Milestones
- 2022: Microsoft + Synchron demo of Apple Vision OS BCI control
- 2023: UCSF Metzger avatar: BCI → facial motion + speech
- 2024 CES: Synchron + Apple Vision Pro demo
- 2024-Q4: Neuralink patient holds everyday conversations using BCI + voice assistant
- 2026 expected: full BCI + LLM + robotic-arm assisted-living demo
11. Chain of Reasoning
- Insufficient BCI bandwidth means we need a higher-level "expander" — the LLM is the best candidate.
- Hierarchical planning: BCI extracts intent → LLM expands the plan → robot executes.
- Three interface designs (natural language, structured, embedding), each with trade-offs.
- Latency, hallucination, and safety are the core engineering challenges for BLR.
- BLR is the convergence point of BCI work and human-like-intelligence work — the mainstream research direction after 2024.
References
- Chen et al. (2023). HiCRISP: An LLM-driven hierarchical closed-loop robotic intelligent self-correction planner. arXiv:2309.12089.
- Driess et al. (2023). PaLM-E: an embodied multimodal language model. arXiv. https://palm-e.github.io/
- Brohan et al. (2023). RT-2: vision-language-action models transfer web knowledge to robotic control. CoRL.
- Wang et al. (2023). Voyager: an open-ended embodied agent with large language models. arXiv. https://voyager.minedojo.org/
- Metzger et al. (2023). A high-performance neuroprosthesis for speech decoding and avatar control. Nature. https://www.nature.com/articles/s41586-023-06443-4