Milestones in Embodied Intelligence

Overview

The development of embodied intelligence spans over half a century, from early symbolic-reasoning robots to today's foundation model-driven general-purpose robots. This article traces key milestones along a timeline, analyzing the technical innovations behind each breakthrough and their profound impact on the field.

Timeline Overview

timeline
    title History of Embodied Intelligence
    section Early Period (1960s-1990s)
        1969 : Shakey - First general-purpose mobile robot
        1973 : WABOT-1 - First full-scale humanoid robot
        1979 : Stanford Cart - Vision-based navigation pioneer
    section Growth Period (2000s-2010s)
        2000 : ASIMO - Humanoid bipedal walking
        2005 : BigDog - Dynamic quadruped balancing
        2015 : DRC - Disaster response robot competition
    section Explosion Period (2019-Present)
        2019 : OpenAI Rubik's Cube - Dexterous manipulation + Sim2Real
        2022 : RT-1 - Large-scale robot learning
        2023 : RT-2 - VLM-to-VLA transfer
        2024 : Open X-Embodiment + pi0

1. Shakey (1969) -- The Dawn of General-Purpose Mobile Robots

Background

Developed by SRI International, Shakey was the world's first general-purpose mobile robot capable of reasoning about its own actions.

Technical Innovations

STRIPS Planner: The first automated planning system, defining the precondition-effect formalization framework
Perception-Reasoning-Action Loop: Combined AI planning with physical world execution
Vision-Based Navigation: Used a television camera and bump sensors for environment perception

Historical Significance

Shakey demonstrated that symbolic reasoning can drive actions in the physical world. The STRIPS planning formalism remains the theoretical foundation of PDDL to this day.

2. WABOT-1 (1973) -- The First Full-Scale Humanoid Robot

Background

Developed at Waseda University in Japan, WABOT-1 was the world's first full-scale humanoid robot.

Technical Innovations

Bipedal Walking System: Achieved static balance walking, albeit at extremely slow speeds
Vision System: Used two external cameras for object recognition and distance measurement
Hand Grasping: Simple grasping driven by tactile sensors
Language Interaction: Capable of simple conversation in Japanese

Historical Significance

WABOT-1 pioneered the humanoid robot research paradigm, demonstrating the feasibility of building full-scale humanoid systems and laying the groundwork for subsequent research such as ASIMO.

Background

Developed by Hans Moravec at Stanford University, the Stanford Cart was a representative work in early vision-based navigation.

Technical Innovations

Stereo Vision: Obtained depth information by capturing images from different positions with a single camera
Obstacle Detection: Vision-based obstacle avoidance
Path Planning: Autonomous path planning in obstacle-laden environments

Historical Significance

Although extremely slow (it took approximately 5 hours to traverse a 20-meter room), the Stanford Cart demonstrated that pure visual information can support autonomous navigation -- an idea that blossomed again 40 years later in Tesla FSD and embodied navigation systems.

4. ASIMO (2000) -- Breakthrough in Humanoid Bipedal Walking

Background

ASIMO (Advanced Step in Innovative Mobility) was a humanoid robot developed by Honda over 14 years of research.

Technical Innovations

Dynamic Walking: Dynamic balance walking based on ZMP (Zero Moment Point) $$\text{ZMP}: \quad x_{zmp} = \frac{\sum_i m_i(\ddot{z}_i + g)x_i - \sum_i m_i \ddot{x}_i z_i}{\sum_i m_i(\ddot{z}_i + g)}$$
Stair Climbing: Capable of ascending and descending stairs
Gesture Recognition: Recognized simple gesture commands
Autonomous Obstacle Avoidance: Real-time path adjustment

Historical Significance

ASIMO demonstrated that humanoid robots can achieve dynamic, stable locomotion in human environments. The ZMP method became the dominant paradigm for humanoid locomotion control for over a decade.

5. BigDog (2005) -- Dynamic Quadruped Locomotion

Background

A quadruped robot developed by Boston Dynamics for the U.S. military.

Technical Innovations

Dynamic Balancing: Hydraulically driven, capable of maintaining balance on rough terrain
Disturbance Recovery: Able to recover balance after being kicked (the iconic demonstration video)
Terrain Adaptation: Adapted to ice, slopes, gravel, and various other terrains
Load Capacity: Could carry approximately 150 kg of payload

Historical Significance

BigDog demonstrated that robots can achieve near-animal-level dynamic locomotion capabilities, pioneering modern dynamic legged locomotion research and eventually evolving into iconic products like Spot and Atlas.

6. DARPA Robotics Challenge (2015) -- Disaster Response Robots

Background

A robotics competition initiated by DARPA in response to the Fukushima nuclear disaster aftermath, requiring robots to perform tasks such as driving, opening doors, traversing rubble, and closing valves in disaster environments.

Technical Innovations

Whole-Body Motion Planning: Locomotion in complex unstructured environments
Human-Robot Collaborative Teleoperation: Combining remote control with autonomous decision-making
Multimodal Perception Fusion: LiDAR + vision + force sensing
Multi-Task General Platform: A single platform completing multiple heterogeneous tasks

Key Findings

Most robots frequently failed at simple tasks (such as opening doors), exposing the severe lack of robustness in robot systems at the time -- directly driving the subsequent adoption of learning-based methods.

Historical Significance

DRC demonstrated the limitations of traditional engineering approaches in unstructured environments, marking a critical turning point in robotics from pure engineering toward learning-driven methods.

7. OpenAI Rubik's Cube (2019) -- Sim-to-Real and Dexterous Manipulation

Background

OpenAI used reinforcement learning to train a dexterous hand (Shadow Hand) to solve a Rubik's cube in the real world.

Technical Innovations

Large-Scale Domain Randomization: Randomized $>100$ physical parameters in simulation $$\pi^* = \arg\max_\pi \mathbb{E}_{\xi \sim P(\xi)} \left[ \sum_t r(s_t, a_t) \right]$$ where $\xi$ is the randomization parameter vector
Automatic Domain Randomization (ADR): Automatically adjusted randomization ranges
Memory-Augmented Policy: LSTM policy network to handle partial observability
Fingertip Manipulation: Fine control of 24 degrees of freedom

Historical Significance

This work demonstrated that Sim-to-Real transfer can solve extremely fine manipulation tasks, and domain randomization became a standard technique for robot RL thereafter. It also revealed a limitation: the computational resources required for training were enormous.

8. RT-1 (2022) -- Large-Scale Robot Learning

Background

Robotics Transformer released by Google DeepMind, trained on 130k real demonstrations.

Technical Innovations

Tokenized Actions: Discretized continuous actions into tokens
FiLM-conditioned EfficientNet: Visual encoder fusing language instructions through FiLM layers $$\text{FiLM}(x) = \gamma(l) \odot x + \beta(l)$$
Large-Scale Real Data: 13 robots, 17 months, 130k+ trajectories
Multi-Task Learning: A single model handling 700+ tasks

Historical Significance

RT-1 demonstrated the effectiveness of scaling data and model capacity for robot policies, pioneering the study of "Scaling Laws for Robot Learning."

9. RT-2 (2023) -- From VLM to VLA

Background

Google DeepMind fine-tuned a Vision-Language Model (VLM) directly into a Vision-Language-Action model (VLA).

Technical Innovations

Actions as Text Tokens: Encoded robot actions as natural language token sequences
VLM Knowledge Transfer: Directly transferred internet-pretrained vision-language knowledge to robot control
Emergent Reasoning Abilities: Could understand semantic instructions never seen before (e.g., "throw the trash in the trash can")
Symbolic Reasoning + Physical Manipulation: Unified symbolic reasoning and physical control in a single model

Historical Significance

RT-2 demonstrated that internet knowledge in VLMs can be grounded in the physical world, establishing the VLA paradigm that became the foundational framework for subsequent models like Octo and pi0.

10. Open X-Embodiment (2024) -- Cross-Embodiment Transfer

Background

Jointly released by 33 research institutions, comprising a dataset of 22 robot types, 1 million+ real trajectories, and RT-X models.

Technical Innovations

Unified Data Format: RLDS (Reinforcement Learning Datasets) standard
Cross-Robot Transfer: Sharing training data across robots of different morphologies
Positive Transfer Validation: Experiments demonstrated that cross-embodiment data improves individual robot performance
Open Ecosystem: Open-source datasets and models

Historical Significance

Open X-Embodiment pioneered the open data ecosystem for embodied intelligence, demonstrating the feasibility of cross-embodiment transfer learning, analogous to the significance of Common Crawl for language models in NLP.

11. pi0 (2024) -- General-Purpose Robot Foundation Model

Background

A general-purpose robot policy model launched by Physical Intelligence.

Technical Innovations

VLM Backbone: Based on a pretrained VLM as the perception and reasoning foundation
Flow Matching Action Head: $$v_\theta(x_t, t) = \frac{dx_t}{dt}, \quad x_1 = x_0 + \int_0^1 v_\theta(x_t, t) dt$$ Uses flow matching instead of diffusion models for action generation
Multi-Task Generalization: A single model performing tasks such as folding clothes, tidying tables, and packing boxes
Zero-Shot Transfer: Works on unseen scenarios and objects

Historical Significance

pi0 represents a new paradigm for general-purpose robot foundation models, successfully bringing the large-scale pretraining + flexible fine-tuning paradigm from language to robotics.

12. Milestone Comparison Summary

Milestone	Year	What It Proved	Core Methodology
Shakey	1969	Symbolic reasoning can drive physical actions	STRIPS planning
WABOT-1	1973	Full-scale humanoid robots are feasible	Engineering integration
Stanford Cart	1979	Vision can support autonomous navigation	Stereo vision
ASIMO	2000	Humanoid dynamic walking	ZMP control
BigDog	2005	Animal-level dynamic locomotion	Hydraulics + feedback control
DRC	2015	Insufficient robustness of traditional methods	Teleoperation + autonomy
Rubik's Cube	2019	Sim2Real + dexterous manipulation	RL + domain randomization
RT-1	2022	Data scaling laws	Transformer + large data
RT-2	2023	VLM to VLA transfer	Actions as tokens
Open X-Embodiment	2024	Cross-embodiment transfer	Open data ecosystem
pi0	2024	General robot foundation model	VLM + Flow Matching

13. Future Outlook

Based on current trends, the next possible milestones:

Truly General-Purpose Home Robots: Capable of completing various daily tasks in open home environments
Self-Learning Robots: Acquiring skills through exploration and interaction without human demonstrations
Multi-Robot Collaboration: Multiple heterogeneous robots cooperatively completing complex tasks
Long-Term Autonomous Operation: Robots operating continuously in real environments for months without human intervention

References

Nilsson, N. J. "Shakey the Robot." SRI International, 1984
Ahn et al., "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances," 2022
Brohan et al., "RT-1" and "RT-2," 2022-2023
Open X-Embodiment Collaboration, 2024
Black et al., "pi0," 2024

Related Notes:

Milestones in Embodied Intelligence

Overview

Timeline Overview

1. Shakey (1969) -- The Dawn of General-Purpose Mobile Robots

Background

Technical Innovations

Historical Significance

2. WABOT-1 (1973) -- The First Full-Scale Humanoid Robot

Background

Technical Innovations

Historical Significance

3. Stanford Cart (1979) -- Vision-Based Autonomous Navigation

Background

Technical Innovations

Historical Significance

4. ASIMO (2000) -- Breakthrough in Humanoid Bipedal Walking

Background

Technical Innovations

Historical Significance

5. BigDog (2005) -- Dynamic Quadruped Locomotion

Background

Technical Innovations

Historical Significance

6. DARPA Robotics Challenge (2015) -- Disaster Response Robots

Background

Technical Innovations

Key Findings

Historical Significance

7. OpenAI Rubik's Cube (2019) -- Sim-to-Real and Dexterous Manipulation

Background

Technical Innovations

Historical Significance

8. RT-1 (2022) -- Large-Scale Robot Learning

Background

Technical Innovations

Historical Significance

9. RT-2 (2023) -- From VLM to VLA

Background

Technical Innovations

Historical Significance

10. Open X-Embodiment (2024) -- Cross-Embodiment Transfer

Background

Technical Innovations

Historical Significance

11. pi0 (2024) -- General-Purpose Robot Foundation Model

Background

Technical Innovations

Historical Significance

12. Milestone Comparison Summary

13. Future Outlook

References

评论 #