Open-Source Robot Learning Frameworks

In recent years, the rapid development of robot learning research has produced numerous high-quality open-source frameworks. This article surveys mainstream robot learning frameworks, manipulation benchmarks, and their applicable scenarios.

Framework Ecosystem Relationships

graph TD
    subgraph Environments["Simulation Environments"]
        RS[robosuite<br/>MuJoCo Manipulation Tasks]
        MS[ManiSkill<br/>SAPIEN Manipulation Tasks]
        GR[Gymnasium-Robotics<br/>Standard Environments]
    end

    subgraph Algorithms["Algorithms & Benchmarks"]
        RM[robomimic<br/>Imitation Learning Benchmark]
        LR[LeRobot<br/>Unified Training Framework]
    end

    subgraph RealRobot["Real Robot Deployment"]
        HW[Open-source Hardware<br/>Koch / SO-100 / ALOHA]
    end

    RS --> RM
    RS --> LR
    MS --> LR
    GR --> LR
    RM --> LR
    LR --> HW

    style Environments fill:#e3f2fd
    style Algorithms fill:#e8f5e9
    style RealRobot fill:#fff3e0

Framework Overview

Framework	Maintainer	Core Function	Simulation Backend	Supported Algorithms	License
LeRobot	HuggingFace	End-to-end robot learning	Multiple	ACT, DP, TDMPC	Apache 2.0
robomimic	Stanford/UT Austin	Imitation learning benchmark	robosuite	BC, BC-RNN, HBC, IRIS, Diffusion	MIT
robosuite	Stanford	Manipulation task simulation	MuJoCo	— (environment framework)	MIT
ManiSkill	UC San Diego	Manipulation benchmark	SAPIEN	— (environment framework)	Apache 2.0
Gymnasium-Robotics	Farama	Standardized environments	MuJoCo	— (environment framework)	MIT
OctoRobot/Octo	UC Berkeley	General robot policy	Multiple	Octo Transformer	MIT
droid	TRI/multi-university	Large-scale data collection	Real robot	— (data framework)	MIT

LeRobot (HuggingFace)

LeRobot is a unified robot learning framework launched by HuggingFace, aiming to make robot learning as accessible as NLP/CV.

Core Design

Feature	Description
Unified Data Format	LeRobot Dataset Format, compatible with HuggingFace Hub
Multi-algorithm Support	ACT, Diffusion Policy, TDMPC, VQ-BeT
Real Robot Support	Supports Koch v1.1, SO-100, ALOHA and other open-source arms
Recording Tools	Built-in teleoperation recording, data visualization
Pretrained Models	Shared pretrained weights on HuggingFace Hub

Supported Algorithms

Algorithm	Type	Core Idea	Paper
ACT	Imitation learning	CVAE + Transformer, chunk action	Zhao et al., 2023
Diffusion Policy	Imitation learning	Diffusion model generates action sequences	Chi et al., 2023
TDMPC	Model-based RL	Temporal Difference + MPC	Hansen et al., 2022
VQ-BeT	Imitation learning	VQ-VAE discretized actions + Transformer	Lee et al., 2024

Usage Example

# Installation
# pip install lerobot

# Train ACT policy
from lerobot.scripts.train import train

train(
    dataset_repo_id="lerobot/aloha_sim_insertion_human",
    policy="act",
    env="aloha",
    training_steps=100000,
    batch_size=8,
    lr=1e-5,
)

# Evaluation
from lerobot.scripts.eval import eval_policy

eval_policy(
    pretrained_policy_name_or_path="lerobot/act_aloha_sim_insertion_human",
    env="aloha",
    n_episodes=50,
)

Data Collection (Real Robot)

# Teleoperation recording with Koch v1.1 arm
python lerobot/scripts/control_robot.py record \
    --robot-path lerobot/configs/robot/koch.yaml \
    --fps 30 \
    --repo-id user/my_dataset \
    --num-episodes 50

# Data visualization
python lerobot/scripts/visualize_dataset.py \
    --repo-id user/my_dataset

Data Format

LeRobot defines a standardized data format stored in HuggingFace Dataset:

dataset/
├── meta/
│   ├── info.json              # Dataset metadata
│   ├── episodes.jsonl         # Per-episode information
│   └── tasks.jsonl            # Task descriptions
├── data/
│   ├── chunk-000/
│   │   └── episode_000000.parquet  # States + actions
│   └── ...
└── videos/
    ├── chunk-000/
    │   ├── observation.images.top/
    │   │   └── episode_000000.mp4
    │   └── ...
    └── ...

robomimic

robomimic is an imitation learning benchmark framework jointly developed by Stanford and UT Austin, providing systematic algorithm comparisons.

Core Features

Feature	Description
Datasets	Multi-quality-level demonstration data (proficient, multi-human, machine-generated)
Rich Algorithms	BC, BC-RNN, HBC, IRIS, Diffusion Policy
Standardized Evaluation	Unified evaluation metrics and protocols
Simulation Backend	robosuite (MuJoCo)

Supported Algorithms

Algorithm	Category	Core Method
BC	Behavioral cloning	Direct supervised learning
BC-RNN	Behavioral cloning	LSTM for temporal dependencies
HBC	Hierarchical imitation	Subgoal + low-level policy
IRIS	Hierarchical imitation	Subgoal discovery + planning
Diffusion Policy	Generative imitation	Diffusion denoising generates actions

Usage Example

import robomimic
import robomimic.utils.train_utils as TrainUtils

# Load configuration
config = robomimic.utils.config_utils.get_config(algo_name="bc")
config.train.data = "path/to/dataset.hdf5"
config.train.output_dir = "trained_models/bc_lift"
config.train.num_epochs = 200

# Train
TrainUtils.run_training(config)

Key Findings (Paper)

Important conclusions from the robomimic paper:

Data quality > Data quantity: A small amount of expert data outperforms large amounts of non-expert data
History is critical: BC-RNN significantly outperforms BC on multi-modal data
Diffusion Policy is overall best: Best performance on most tasks
Human data is harder to learn: Human teleoperation data has more complex distributions than machine-generated data

For more on imitation learning, see Imitation Learning.

robosuite

robosuite is a MuJoCo-based robot manipulation simulation benchmark providing standardized tasks and robot models.

Built-in Tasks

Task	Description	Difficulty
Lift	Lift a cube	Easy
Stack	Stack cubes	Medium
NutAssembly	Nut assembly	Medium
PickPlace	Pick and place	Medium
Door	Open door	Medium
Wipe	Wipe table	Hard
TwoArmHandover	Two-arm handover	Hard

Built-in Robots

Robot	DOF	Type
Panda	7	Single arm
Sawyer	7	Single arm
UR5e	6	Single arm
IIWA	7	Single arm
Jaco	7	Single arm
Baxter	7+7	Dual arm

import robosuite as suite

env = suite.make(
    env_name="Lift",
    robots="Panda",
    has_renderer=True,
    has_offscreen_renderer=True,
    use_camera_obs=True,
    camera_names=["frontview", "agentview"],
)

obs = env.reset()
for _ in range(1000):
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)

ManiSkill

ManiSkill is based on the SAPIEN simulator, focusing on generalizable manipulation task benchmarks.

Highlights

Feature	Description
Articulated Objects	Based on PartNet-Mobility dataset, containing numerous articulated objects (doors, drawers, etc.)
GPU Parallel	SAPIEN GPU backend supports parallel simulation
Vision + Point Cloud	Provides RGB-D + point cloud observations
Generalization Evaluation	Train/test on different object instances
ManiSkill3	Latest version, significantly improved parallel performance

ManiSkill3 Environments

Category	Task Examples
Tabletop manipulation	PickCube, StackCube, PegInsertionSide
Articulated objects	OpenCabinetDrawer, OpenCabinetDoor, TurnFaucet
Assembly	AssemblingKits, PlugCharger
Soft body	PourWater, FillCup
Mobile manipulation	OpenDoor (mobile base)

import gymnasium as gym
import mani_skill.envs

env = gym.make(
    "PickCube-v1",
    obs_mode="rgbd",           # "state", "rgbd", "pointcloud"
    control_mode="pd_ee_delta_pose",
    render_mode="human",
    num_envs=4096,             # GPU parallel
)

Gymnasium-Robotics

A standardized robot environment collection maintained by the Farama Foundation, inherited from OpenAI Gym.

Environment List

Environment Group	Simulation Backend	Task Type
Fetch	MuJoCo	Reach, Push, Slide, Pick&Place
Shadow Hand	MuJoCo	Dexterous hand manipulation (cube rotation, etc.)
Maze	MuJoCo	Maze navigation
Adroit	MuJoCo	Dexterous hand (door opening, pen twirling, etc.)

Features

HER Compatible: All environments support Hindsight Experience Replay
Goal-conditioned: Observations include observation, achieved_goal, desired_goal
Standard Interface: Gymnasium API, seamless integration with Stable-Baselines3, etc.

import gymnasium as gym

env = gym.make("FetchPickAndPlace-v3", render_mode="human")
obs, info = env.reset()
# obs["observation"]: robot state
# obs["achieved_goal"]: current object position
# obs["desired_goal"]: target position

Framework Selection Guide

Requirement	Recommended Framework	Rationale
End-to-end training + real deployment	LeRobot	Unified data/training/deployment pipeline
Imitation learning algorithm comparison	robomimic	Standardized benchmark + multiple algorithms
Manipulation task simulation development	robosuite	Rich tasks + stable MuJoCo
Generalizable manipulation research	ManiSkill	Articulated objects + GPU parallel
Standard RL benchmark	Gymnasium-Robotics	Standard interface + large community
Large-scale policy pretraining	Octo / OpenVLA	Multi-dataset pretraining
Low-cost real-robot entry	LeRobot + Koch/SO-100	Low cost + active community

LeRobot GitHub
robomimic Documentation
robosuite Documentation
ManiSkill Documentation
Related notes: Open-Source Model Summary | Imitation Learning | Open-Source Hardware

Open-Source Robot Learning Frameworks

Framework Ecosystem Relationships

Framework Overview

LeRobot (HuggingFace)

Core Design

Supported Algorithms

Usage Example

Data Collection (Real Robot)

Data Format

robomimic

Core Features

Supported Algorithms

Usage Example

Key Findings (Paper)

robosuite

Built-in Tasks

Built-in Robots

ManiSkill

Highlights

ManiSkill3 Environments

Gymnasium-Robotics

Environment List

Features

Framework Selection Guide

Related Links

评论 #