Open-Source Robot Learning Frameworks
In recent years, the rapid development of robot learning research has produced numerous high-quality open-source frameworks. This article surveys mainstream robot learning frameworks, manipulation benchmarks, and their applicable scenarios.
Framework Ecosystem Relationships
graph TD
subgraph Environments["Simulation Environments"]
RS[robosuite<br/>MuJoCo Manipulation Tasks]
MS[ManiSkill<br/>SAPIEN Manipulation Tasks]
GR[Gymnasium-Robotics<br/>Standard Environments]
end
subgraph Algorithms["Algorithms & Benchmarks"]
RM[robomimic<br/>Imitation Learning Benchmark]
LR[LeRobot<br/>Unified Training Framework]
end
subgraph RealRobot["Real Robot Deployment"]
HW[Open-source Hardware<br/>Koch / SO-100 / ALOHA]
end
RS --> RM
RS --> LR
MS --> LR
GR --> LR
RM --> LR
LR --> HW
style Environments fill:#e3f2fd
style Algorithms fill:#e8f5e9
style RealRobot fill:#fff3e0
Framework Overview
| Framework |
Maintainer |
Core Function |
Simulation Backend |
Supported Algorithms |
License |
| LeRobot |
HuggingFace |
End-to-end robot learning |
Multiple |
ACT, DP, TDMPC |
Apache 2.0 |
| robomimic |
Stanford/UT Austin |
Imitation learning benchmark |
robosuite |
BC, BC-RNN, HBC, IRIS, Diffusion |
MIT |
| robosuite |
Stanford |
Manipulation task simulation |
MuJoCo |
— (environment framework) |
MIT |
| ManiSkill |
UC San Diego |
Manipulation benchmark |
SAPIEN |
— (environment framework) |
Apache 2.0 |
| Gymnasium-Robotics |
Farama |
Standardized environments |
MuJoCo |
— (environment framework) |
MIT |
| OctoRobot/Octo |
UC Berkeley |
General robot policy |
Multiple |
Octo Transformer |
MIT |
| droid |
TRI/multi-university |
Large-scale data collection |
Real robot |
— (data framework) |
MIT |
LeRobot (HuggingFace)
LeRobot is a unified robot learning framework launched by HuggingFace, aiming to make robot learning as accessible as NLP/CV.
Core Design
| Feature |
Description |
| Unified Data Format |
LeRobot Dataset Format, compatible with HuggingFace Hub |
| Multi-algorithm Support |
ACT, Diffusion Policy, TDMPC, VQ-BeT |
| Real Robot Support |
Supports Koch v1.1, SO-100, ALOHA and other open-source arms |
| Recording Tools |
Built-in teleoperation recording, data visualization |
| Pretrained Models |
Shared pretrained weights on HuggingFace Hub |
Supported Algorithms
| Algorithm |
Type |
Core Idea |
Paper |
| ACT |
Imitation learning |
CVAE + Transformer, chunk action |
Zhao et al., 2023 |
| Diffusion Policy |
Imitation learning |
Diffusion model generates action sequences |
Chi et al., 2023 |
| TDMPC |
Model-based RL |
Temporal Difference + MPC |
Hansen et al., 2022 |
| VQ-BeT |
Imitation learning |
VQ-VAE discretized actions + Transformer |
Lee et al., 2024 |
Usage Example
# Installation
# pip install lerobot
# Train ACT policy
from lerobot.scripts.train import train
train(
dataset_repo_id="lerobot/aloha_sim_insertion_human",
policy="act",
env="aloha",
training_steps=100000,
batch_size=8,
lr=1e-5,
)
# Evaluation
from lerobot.scripts.eval import eval_policy
eval_policy(
pretrained_policy_name_or_path="lerobot/act_aloha_sim_insertion_human",
env="aloha",
n_episodes=50,
)
Data Collection (Real Robot)
# Teleoperation recording with Koch v1.1 arm
python lerobot/scripts/control_robot.py record \
--robot-path lerobot/configs/robot/koch.yaml \
--fps 30 \
--repo-id user/my_dataset \
--num-episodes 50
# Data visualization
python lerobot/scripts/visualize_dataset.py \
--repo-id user/my_dataset
LeRobot defines a standardized data format stored in HuggingFace Dataset:
dataset/
├── meta/
│ ├── info.json # Dataset metadata
│ ├── episodes.jsonl # Per-episode information
│ └── tasks.jsonl # Task descriptions
├── data/
│ ├── chunk-000/
│ │ └── episode_000000.parquet # States + actions
│ └── ...
└── videos/
├── chunk-000/
│ ├── observation.images.top/
│ │ └── episode_000000.mp4
│ └── ...
└── ...
robomimic
robomimic is an imitation learning benchmark framework jointly developed by Stanford and UT Austin, providing systematic algorithm comparisons.
Core Features
| Feature |
Description |
| Datasets |
Multi-quality-level demonstration data (proficient, multi-human, machine-generated) |
| Rich Algorithms |
BC, BC-RNN, HBC, IRIS, Diffusion Policy |
| Standardized Evaluation |
Unified evaluation metrics and protocols |
| Simulation Backend |
robosuite (MuJoCo) |
Supported Algorithms
| Algorithm |
Category |
Core Method |
| BC |
Behavioral cloning |
Direct supervised learning |
| BC-RNN |
Behavioral cloning |
LSTM for temporal dependencies |
| HBC |
Hierarchical imitation |
Subgoal + low-level policy |
| IRIS |
Hierarchical imitation |
Subgoal discovery + planning |
| Diffusion Policy |
Generative imitation |
Diffusion denoising generates actions |
Usage Example
import robomimic
import robomimic.utils.train_utils as TrainUtils
# Load configuration
config = robomimic.utils.config_utils.get_config(algo_name="bc")
config.train.data = "path/to/dataset.hdf5"
config.train.output_dir = "trained_models/bc_lift"
config.train.num_epochs = 200
# Train
TrainUtils.run_training(config)
Key Findings (Paper)
Important conclusions from the robomimic paper:
- Data quality > Data quantity: A small amount of expert data outperforms large amounts of non-expert data
- History is critical: BC-RNN significantly outperforms BC on multi-modal data
- Diffusion Policy is overall best: Best performance on most tasks
- Human data is harder to learn: Human teleoperation data has more complex distributions than machine-generated data
For more on imitation learning, see Imitation Learning.
robosuite
robosuite is a MuJoCo-based robot manipulation simulation benchmark providing standardized tasks and robot models.
Built-in Tasks
| Task |
Description |
Difficulty |
| Lift |
Lift a cube |
Easy |
| Stack |
Stack cubes |
Medium |
| NutAssembly |
Nut assembly |
Medium |
| PickPlace |
Pick and place |
Medium |
| Door |
Open door |
Medium |
| Wipe |
Wipe table |
Hard |
| TwoArmHandover |
Two-arm handover |
Hard |
Built-in Robots
| Robot |
DOF |
Type |
| Panda |
7 |
Single arm |
| Sawyer |
7 |
Single arm |
| UR5e |
6 |
Single arm |
| IIWA |
7 |
Single arm |
| Jaco |
7 |
Single arm |
| Baxter |
7+7 |
Dual arm |
import robosuite as suite
env = suite.make(
env_name="Lift",
robots="Panda",
has_renderer=True,
has_offscreen_renderer=True,
use_camera_obs=True,
camera_names=["frontview", "agentview"],
)
obs = env.reset()
for _ in range(1000):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
ManiSkill
ManiSkill is based on the SAPIEN simulator, focusing on generalizable manipulation task benchmarks.
Highlights
| Feature |
Description |
| Articulated Objects |
Based on PartNet-Mobility dataset, containing numerous articulated objects (doors, drawers, etc.) |
| GPU Parallel |
SAPIEN GPU backend supports parallel simulation |
| Vision + Point Cloud |
Provides RGB-D + point cloud observations |
| Generalization Evaluation |
Train/test on different object instances |
| ManiSkill3 |
Latest version, significantly improved parallel performance |
ManiSkill3 Environments
| Category |
Task Examples |
| Tabletop manipulation |
PickCube, StackCube, PegInsertionSide |
| Articulated objects |
OpenCabinetDrawer, OpenCabinetDoor, TurnFaucet |
| Assembly |
AssemblingKits, PlugCharger |
| Soft body |
PourWater, FillCup |
| Mobile manipulation |
OpenDoor (mobile base) |
import gymnasium as gym
import mani_skill.envs
env = gym.make(
"PickCube-v1",
obs_mode="rgbd", # "state", "rgbd", "pointcloud"
control_mode="pd_ee_delta_pose",
render_mode="human",
num_envs=4096, # GPU parallel
)
Gymnasium-Robotics
A standardized robot environment collection maintained by the Farama Foundation, inherited from OpenAI Gym.
Environment List
| Environment Group |
Simulation Backend |
Task Type |
| Fetch |
MuJoCo |
Reach, Push, Slide, Pick&Place |
| Shadow Hand |
MuJoCo |
Dexterous hand manipulation (cube rotation, etc.) |
| Maze |
MuJoCo |
Maze navigation |
| Adroit |
MuJoCo |
Dexterous hand (door opening, pen twirling, etc.) |
Features
- HER Compatible: All environments support Hindsight Experience Replay
- Goal-conditioned: Observations include
observation, achieved_goal, desired_goal
- Standard Interface: Gymnasium API, seamless integration with Stable-Baselines3, etc.
import gymnasium as gym
env = gym.make("FetchPickAndPlace-v3", render_mode="human")
obs, info = env.reset()
# obs["observation"]: robot state
# obs["achieved_goal"]: current object position
# obs["desired_goal"]: target position
Framework Selection Guide
| Requirement |
Recommended Framework |
Rationale |
| End-to-end training + real deployment |
LeRobot |
Unified data/training/deployment pipeline |
| Imitation learning algorithm comparison |
robomimic |
Standardized benchmark + multiple algorithms |
| Manipulation task simulation development |
robosuite |
Rich tasks + stable MuJoCo |
| Generalizable manipulation research |
ManiSkill |
Articulated objects + GPU parallel |
| Standard RL benchmark |
Gymnasium-Robotics |
Standard interface + large community |
| Large-scale policy pretraining |
Octo / OpenVLA |
Multi-dataset pretraining |
| Low-cost real-robot entry |
LeRobot + Koch/SO-100 |
Low cost + active community |