Skip to content

Overview of Robot Learning

Why Robot Learning Is Needed

Traditional robots rely on hand-programmed rules and controllers, performing well in structured environments such as factory production lines. However, when robots face unstructured environments — such as home kitchens, outdoor terrain, or human collaboration — manually writing rules becomes infeasible. The core goal of Robot Learning is to enable robots to autonomously acquire behavioral capabilities from data and experience.

Robot Learning vs. Standard Machine Learning

Robot learning differs fundamentally from standard machine learning:

Dimension Standard ML (e.g., CV/NLP) Robot Learning
Data Scale Billions of samples (ImageNet, Common Crawl) Hundreds to thousands of demonstrations
Data Acquisition Web crawling/annotation, low cost Teleoperation/real-robot collection, extremely high cost
Feedback Delay Immediate loss function Can only evaluate after physical execution
Safety Low cost of prediction errors Wrong actions may damage the robot or environment
Real-time Requirements Batch inference acceptable Control frequency 10–1000 Hz
State Space i.i.d. samples Temporally correlated, partially observable
Distribution Shift Test set close to training set Deployment environment continuously changes

These differences have led robot learning to develop a unique methodological framework.


Classification of Robot Learning Methods

graph TD
    A[Robot Learning Methods] --> B[Imitation Learning]
    A --> C[Reinforcement Learning]
    A --> D[Self-Supervised Learning]
    A --> E[Foundation Model Based]

    B --> B1[Behavioral Cloning BC]
    B --> B2[Inverse RL IRL]
    B --> B3[DAgger]
    B --> B4[Diffusion Policy]

    C --> C1[Model-Free RL<br/>SAC / PPO]
    C --> C2[Model-Based RL<br/>Dreamer / MBPO]
    C --> C3[Sim2Real<br/>Domain Randomization / Adaptation]
    C --> C4[Offline RL<br/>CQL / IQL]

    D --> D1[Contrastive Learning<br/>Time-Contrastive]
    D --> D2[Predictive Learning<br/>Forward Model]
    D --> D3[Masked Autoencoding<br/>MAE for Robotics]

    E --> E1[VLA Models<br/>RT-2 / OpenVLA]
    E --> E2[World Models<br/>UniSim / Genie]
    E --> E3[LLM Planners<br/>SayCan / Code-as-Policy]
    E --> E4[Visual Foundation Models<br/>DINOv2 / SAM]

    style A fill:#e1f5fe
    style B fill:#fff3e0
    style C fill:#e8f5e9
    style D fill:#f3e5f5
    style E fill:#fce4ec

Four Major Paradigms in Detail

1. Imitation Learning

Core Idea: Learn a policy \(\pi_\theta(a|o)\) from expert demonstrations without designing a reward function.

Mathematical Framework: Given an expert demonstration dataset \(\mathcal{D} = \{(o_i, a_i^*)\}_{i=1}^N\), the objective is to minimize the discrepancy between the policy and the expert:

\[ \min_\theta \mathbb{E}_{(o, a^*) \sim \mathcal{D}} \left[ \mathcal{L}(\pi_\theta(o), a^*) \right] \]

The choice of loss function \(\mathcal{L}\) depends on the action space:

  • Continuous actions: MSE loss \(\|\pi_\theta(o) - a^*\|^2\)
  • Discrete actions: Cross-entropy loss \(-\sum_a a^* \log \pi_\theta(a|o)\)
  • Multimodal actions: Diffusion model loss, Gaussian mixture loss

Advantages and Limitations:

  • Advantages: Direct, efficient, no reward design needed
  • Limitations: Distribution shift (compounding error), high data collection cost

See Imitation Learning for details.

2. Reinforcement Learning

Core Idea: Maximize cumulative reward \(\mathbb{E}\left[\sum_{t=0}^T \gamma^t r(s_t, a_t)\right]\) through trial-and-error interaction.

Key Challenges:

  • Sample Efficiency: Model-free RL on real robots requires millions of interaction steps, which is impractical
  • Reward Engineering: Designing dense rewards for complex manipulation tasks is extremely difficult
  • Safety Constraints: Dangerous actions must be avoided during exploration

Solutions:

  • Simulation Training + Sim2Real Transfer: Massively parallel training in simulation, transferring to real environments via domain randomization
  • Offline RL: Learning from a fixed dataset without online interaction
  • Reward Learning: Automatically inferring rewards from human preferences or language descriptions

See Reinforcement Learning in Robotics for details.

3. Self-Supervised Learning

Core Idea: Learn useful representations from unlabeled interaction data, reducing dependence on human annotations.

Typical Methods:

Time-Contrastive Learning: Leveraging the temporal structure of video to map temporally close frames to nearby representation spaces:

\[ \mathcal{L}_{\text{TCN}} = -\log \frac{\exp(\text{sim}(z_t, z_{t+k}) / \tau)}{\sum_{j} \exp(\text{sim}(z_t, z_j) / \tau)} \]

Forward Prediction Model: Learning to predict the effect of actions on states:

\[ \hat{s}_{t+1} = f_\theta(s_t, a_t), \quad \mathcal{L} = \|s_{t+1} - \hat{s}_{t+1}\|^2 \]

Masked Autoencoding: Applying the MAE paradigm to robotics, learning representations by reconstructing masked sensory inputs.

4. Foundation Model Based

Core Idea: Leveraging large models (LLMs, VLMs) pretrained on massive data to provide robots with semantic understanding, commonsense reasoning, and task planning capabilities.

Key Paradigms:

  • VLA Models (Vision-Language-Action): End-to-end mapping from visual-language inputs to robot actions
    • Representatives: RT-2, OpenVLA, \(\pi_0\)
  • LLM as Planner: Using LLM reasoning capabilities to decompose tasks
    • Representatives: SayCan, Code-as-Policies, Inner Monologue
  • World Models: Learning generative models of environment dynamics for imaginative planning
    • Representatives: UniSim, Genie, DIAMOND

Evolution of Learning Paradigms

timeline
    title Key Milestones in Robot Learning
    1989 : Pomerleau ALVINN<br/>First neural network end-to-end driving
    2004 : Abbeel Apprenticeship Learning<br/>Helicopter acrobatics
    2013 : DQN<br/>Deep RL breakthrough on Atari
    2016 : Levine et al.<br/>Large-scale grasping learning
    2018 : OpenAI Dactyl<br/>Dexterous hand manipulation
    2020 : DAgger + BC<br/>Industrial-grade imitation learning
    2022 : RT-1 / RT-2<br/>Robot foundation models
    2023 : Diffusion Policy<br/>Diffusion-based policy
    2024 : pi0 / OpenVLA<br/>VLA model wave
    2025 : Data Flywheel<br/>Open X-Embodiment

Core Challenges

Data Bottleneck

The biggest bottleneck in robot learning is data. A comparison:

  • GPT-4 training data: ~13 trillion tokens
  • ImageNet: ~14 million images
  • Open X-Embodiment: ~1 million robot trajectories (currently the largest)
  • Typical lab datasets: hundreds to thousands of trajectories

Data scarcity has driven unique methodological needs:

  1. Data-efficient algorithms: Few-shot learning, meta-learning
  2. Data augmentation: Simulation generation, viewpoint transformations
  3. Data sharing: Cross-robot, cross-task data reuse
  4. Synthetic data: Generating training data using simulators and generative models

Safety

Robots execute actions in the physical world, and errors are irreversible. Safety constraints manifest in:

  • Training phase: Avoiding dangerous actions during exploration (constrained RL, safe sets)
  • Deployment phase: Real-time anomaly monitoring, triggering safety stops
  • Formal guarantees: Control barrier functions (CBF), Lyapunov stability

Real-time Requirements

The robot control loop demands low-latency inference:

Task Type Control Frequency Inference Latency Requirement
Quadruped walking 50–200 Hz < 5 ms
Robotic arm manipulation 10–50 Hz < 20 ms
Dexterous hand manipulation 100–1000 Hz < 1 ms
Navigation 5–20 Hz < 50 ms

This requires models to be lightweight, or to use techniques such as distillation and quantization to compress inference overhead.


Chapter Navigation

This section covers the core methods of robot learning in detail:

Topic Content Summary
Imitation Learning BC, DAgger, IRL, GAIL, ACT
Reinforcement Learning in Robotics Reward engineering, massively parallel training, asymmetric Actor-Critic
Sim2Real Domain randomization, system identification, domain adaptation, Teacher-Student distillation
Teleoperation and Data Collection ALOHA, UMI, GELLO, data scaling strategies
Diffusion Policy Diffusion Policy, DP3, Consistency Policy
Multi-task Learning and Generalization Multi-task learning, few-shot adaptation, zero-shot transfer, benchmarks

Connections to Other Chapters

  • Theoretical Foundations \(\leftarrow\) Robotics Fundamentals: Kinematics and dynamics provide state space and action space definitions for learning algorithms
  • Models and Algorithms \(\rightarrow\) Models and Algorithms: VLA models and world models represent the current frontier of learning paradigms
  • Simulation Platforms \(\leftrightarrow\) Simulation Platforms: Simulators are the infrastructure for robot RL and Sim2Real
  • Hardware \(\leftarrow\) Hardware Platforms: Sensors and actuators determine the observation and action spaces

  1. Kroemer, O., Niekum, S., & Konidaris, G. (2021). A Review of Robot Learning for Manipulation. Annual Review of Control, Robotics, and Autonomous Systems.
  2. Zhu, H., et al. (2023). A Survey on Robot Learning in the Era of Large Models. arXiv:2311.14379.
  3. Fang, H., et al. (2024). Robot Learning: From Imitation to Foundation Models. Annual Review of AI.

评论 #