Skip to content

Open-Source Robot Learning Frameworks

In recent years, the rapid development of robot learning research has produced numerous high-quality open-source frameworks. This article surveys mainstream robot learning frameworks, manipulation benchmarks, and their applicable scenarios.


Framework Ecosystem Relationships

graph TD
    subgraph Environments["Simulation Environments"]
        RS[robosuite<br/>MuJoCo Manipulation Tasks]
        MS[ManiSkill<br/>SAPIEN Manipulation Tasks]
        GR[Gymnasium-Robotics<br/>Standard Environments]
    end

    subgraph Algorithms["Algorithms & Benchmarks"]
        RM[robomimic<br/>Imitation Learning Benchmark]
        LR[LeRobot<br/>Unified Training Framework]
    end

    subgraph RealRobot["Real Robot Deployment"]
        HW[Open-source Hardware<br/>Koch / SO-100 / ALOHA]
    end

    RS --> RM
    RS --> LR
    MS --> LR
    GR --> LR
    RM --> LR
    LR --> HW

    style Environments fill:#e3f2fd
    style Algorithms fill:#e8f5e9
    style RealRobot fill:#fff3e0

Framework Overview

Framework Maintainer Core Function Simulation Backend Supported Algorithms License
LeRobot HuggingFace End-to-end robot learning Multiple ACT, DP, TDMPC Apache 2.0
robomimic Stanford/UT Austin Imitation learning benchmark robosuite BC, BC-RNN, HBC, IRIS, Diffusion MIT
robosuite Stanford Manipulation task simulation MuJoCo — (environment framework) MIT
ManiSkill UC San Diego Manipulation benchmark SAPIEN — (environment framework) Apache 2.0
Gymnasium-Robotics Farama Standardized environments MuJoCo — (environment framework) MIT
OctoRobot/Octo UC Berkeley General robot policy Multiple Octo Transformer MIT
droid TRI/multi-university Large-scale data collection Real robot — (data framework) MIT

LeRobot (HuggingFace)

LeRobot is a unified robot learning framework launched by HuggingFace, aiming to make robot learning as accessible as NLP/CV.

Core Design

Feature Description
Unified Data Format LeRobot Dataset Format, compatible with HuggingFace Hub
Multi-algorithm Support ACT, Diffusion Policy, TDMPC, VQ-BeT
Real Robot Support Supports Koch v1.1, SO-100, ALOHA and other open-source arms
Recording Tools Built-in teleoperation recording, data visualization
Pretrained Models Shared pretrained weights on HuggingFace Hub

Supported Algorithms

Algorithm Type Core Idea Paper
ACT Imitation learning CVAE + Transformer, chunk action Zhao et al., 2023
Diffusion Policy Imitation learning Diffusion model generates action sequences Chi et al., 2023
TDMPC Model-based RL Temporal Difference + MPC Hansen et al., 2022
VQ-BeT Imitation learning VQ-VAE discretized actions + Transformer Lee et al., 2024

Usage Example

# Installation
# pip install lerobot

# Train ACT policy
from lerobot.scripts.train import train

train(
    dataset_repo_id="lerobot/aloha_sim_insertion_human",
    policy="act",
    env="aloha",
    training_steps=100000,
    batch_size=8,
    lr=1e-5,
)

# Evaluation
from lerobot.scripts.eval import eval_policy

eval_policy(
    pretrained_policy_name_or_path="lerobot/act_aloha_sim_insertion_human",
    env="aloha",
    n_episodes=50,
)

Data Collection (Real Robot)

# Teleoperation recording with Koch v1.1 arm
python lerobot/scripts/control_robot.py record \
    --robot-path lerobot/configs/robot/koch.yaml \
    --fps 30 \
    --repo-id user/my_dataset \
    --num-episodes 50

# Data visualization
python lerobot/scripts/visualize_dataset.py \
    --repo-id user/my_dataset

Data Format

LeRobot defines a standardized data format stored in HuggingFace Dataset:

dataset/
├── meta/
│   ├── info.json              # Dataset metadata
│   ├── episodes.jsonl         # Per-episode information
│   └── tasks.jsonl            # Task descriptions
├── data/
│   ├── chunk-000/
│   │   └── episode_000000.parquet  # States + actions
│   └── ...
└── videos/
    ├── chunk-000/
    │   ├── observation.images.top/
    │   │   └── episode_000000.mp4
    │   └── ...
    └── ...

robomimic

robomimic is an imitation learning benchmark framework jointly developed by Stanford and UT Austin, providing systematic algorithm comparisons.

Core Features

Feature Description
Datasets Multi-quality-level demonstration data (proficient, multi-human, machine-generated)
Rich Algorithms BC, BC-RNN, HBC, IRIS, Diffusion Policy
Standardized Evaluation Unified evaluation metrics and protocols
Simulation Backend robosuite (MuJoCo)

Supported Algorithms

Algorithm Category Core Method
BC Behavioral cloning Direct supervised learning
BC-RNN Behavioral cloning LSTM for temporal dependencies
HBC Hierarchical imitation Subgoal + low-level policy
IRIS Hierarchical imitation Subgoal discovery + planning
Diffusion Policy Generative imitation Diffusion denoising generates actions

Usage Example

import robomimic
import robomimic.utils.train_utils as TrainUtils

# Load configuration
config = robomimic.utils.config_utils.get_config(algo_name="bc")
config.train.data = "path/to/dataset.hdf5"
config.train.output_dir = "trained_models/bc_lift"
config.train.num_epochs = 200

# Train
TrainUtils.run_training(config)

Key Findings (Paper)

Important conclusions from the robomimic paper:

  1. Data quality > Data quantity: A small amount of expert data outperforms large amounts of non-expert data
  2. History is critical: BC-RNN significantly outperforms BC on multi-modal data
  3. Diffusion Policy is overall best: Best performance on most tasks
  4. Human data is harder to learn: Human teleoperation data has more complex distributions than machine-generated data

For more on imitation learning, see Imitation Learning.


robosuite

robosuite is a MuJoCo-based robot manipulation simulation benchmark providing standardized tasks and robot models.

Built-in Tasks

Task Description Difficulty
Lift Lift a cube Easy
Stack Stack cubes Medium
NutAssembly Nut assembly Medium
PickPlace Pick and place Medium
Door Open door Medium
Wipe Wipe table Hard
TwoArmHandover Two-arm handover Hard

Built-in Robots

Robot DOF Type
Panda 7 Single arm
Sawyer 7 Single arm
UR5e 6 Single arm
IIWA 7 Single arm
Jaco 7 Single arm
Baxter 7+7 Dual arm
import robosuite as suite

env = suite.make(
    env_name="Lift",
    robots="Panda",
    has_renderer=True,
    has_offscreen_renderer=True,
    use_camera_obs=True,
    camera_names=["frontview", "agentview"],
)

obs = env.reset()
for _ in range(1000):
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)

ManiSkill

ManiSkill is based on the SAPIEN simulator, focusing on generalizable manipulation task benchmarks.

Highlights

Feature Description
Articulated Objects Based on PartNet-Mobility dataset, containing numerous articulated objects (doors, drawers, etc.)
GPU Parallel SAPIEN GPU backend supports parallel simulation
Vision + Point Cloud Provides RGB-D + point cloud observations
Generalization Evaluation Train/test on different object instances
ManiSkill3 Latest version, significantly improved parallel performance

ManiSkill3 Environments

Category Task Examples
Tabletop manipulation PickCube, StackCube, PegInsertionSide
Articulated objects OpenCabinetDrawer, OpenCabinetDoor, TurnFaucet
Assembly AssemblingKits, PlugCharger
Soft body PourWater, FillCup
Mobile manipulation OpenDoor (mobile base)
import gymnasium as gym
import mani_skill.envs

env = gym.make(
    "PickCube-v1",
    obs_mode="rgbd",           # "state", "rgbd", "pointcloud"
    control_mode="pd_ee_delta_pose",
    render_mode="human",
    num_envs=4096,             # GPU parallel
)

Gymnasium-Robotics

A standardized robot environment collection maintained by the Farama Foundation, inherited from OpenAI Gym.

Environment List

Environment Group Simulation Backend Task Type
Fetch MuJoCo Reach, Push, Slide, Pick&Place
Shadow Hand MuJoCo Dexterous hand manipulation (cube rotation, etc.)
Maze MuJoCo Maze navigation
Adroit MuJoCo Dexterous hand (door opening, pen twirling, etc.)

Features

  • HER Compatible: All environments support Hindsight Experience Replay
  • Goal-conditioned: Observations include observation, achieved_goal, desired_goal
  • Standard Interface: Gymnasium API, seamless integration with Stable-Baselines3, etc.
import gymnasium as gym

env = gym.make("FetchPickAndPlace-v3", render_mode="human")
obs, info = env.reset()
# obs["observation"]: robot state
# obs["achieved_goal"]: current object position
# obs["desired_goal"]: target position

Framework Selection Guide

Requirement Recommended Framework Rationale
End-to-end training + real deployment LeRobot Unified data/training/deployment pipeline
Imitation learning algorithm comparison robomimic Standardized benchmark + multiple algorithms
Manipulation task simulation development robosuite Rich tasks + stable MuJoCo
Generalizable manipulation research ManiSkill Articulated objects + GPU parallel
Standard RL benchmark Gymnasium-Robotics Standard interface + large community
Large-scale policy pretraining Octo / OpenVLA Multi-dataset pretraining
Low-cost real-robot entry LeRobot + Koch/SO-100 Low cost + active community


评论 #