Quadruped Robots
Overview
Quadruped robots are among the most mature legged robot forms today. Compared to bipedal robots, quadrupeds have inherent static stability (the center of mass can remain within the support polygon when three or more legs are on the ground), making movement on unstructured terrain more reliable. In recent years, the introduction of reinforcement learning has evolved quadrupeds from only being able to walk slowly to performing extreme parkour maneuvers.
Why Choose the Quadruped Form
- Stability: Four legs provide a larger support polygon, inherently more stable than bipedal
- Payload capacity: Horizontal torso placement is suitable for carrying sensors and tools
- Terrain adaptation: Can cross gaps, climb slopes, and traverse gravel
- Maturity: Mature solutions exist from control theory to RL training
Gait Fundamentals
Gait Patterns
Quadruped gaits are defined by the phase relationships of four legs. Each leg has two states: stance phase and swing phase.
| Gait | Legs on Ground | Duty Factor | Characteristics | Speed |
|---|---|---|---|---|
| Walk | 3 | ~75% | Always three feet on ground, statically stable | Slowest |
| Trot | 2 | ~50% | Diagonal legs synchronized, most common gait | Medium |
| Pace | 2 | ~50% | Ipsilateral legs synchronized, large lateral sway | Medium |
| Bound | 0-2 | ~30% | Front/rear leg pairs synchronized, has flight phase | Fast |
| Gallop | 0-3 | ~20-30% | Four legs touch down sequentially, has flight phase | Fastest |
Phase representation of gaits: The swing onset time of each leg relative to the gait period \(T\) is called the phase offset \(\phi_i\):
Central Pattern Generator (CPG)
CPG is a biologically inspired rhythmic motion generation model. It generates coordinated limb movement patterns through coupled oscillator networks without continuous high-level commands.
Hopf oscillator model:
where:
- \(r_i = \sqrt{x_i^2 + y_i^2}\): amplitude
- \(\mu\): controls limit cycle radius
- \(\omega_i\): angular frequency, controls gait frequency
- \(w_{ij}\): coupling weights, determine inter-leg phase relationships
- \(\alpha\): convergence rate
The coupling matrix defines gait patterns. For Trot:
Positive coupling indicates in-phase, negative coupling indicates anti-phase.
graph TD
subgraph CPG_Network["CPG Network"]
LF[Left-Front Oscillator] -->|Anti-phase| RF[Right-Front Oscillator]
LF -->|Anti-phase| LH[Left-Hind Oscillator]
LF -->|In-phase| RH[Right-Hind Oscillator]
RF -->|In-phase| LH
RF -->|Anti-phase| RH
LH -->|Anti-phase| RH
end
subgraph Output["Output"]
LF --> LF_joint[Left-Front Joint Trajectory]
RF --> RF_joint[Right-Front Joint Trajectory]
LH --> LH_joint[Left-Hind Joint Trajectory]
RH --> RH_joint[Right-Hind Joint Trajectory]
end
CMD[Velocity Command] --> LF
CMD --> RF
CMD --> LH
CMD --> RH
Stability Analysis
ZMP and Support Polygon
For quadrupeds, the ZMP (Zero Moment Point) must lie within the support polygon formed by the ground contact feet for the robot to maintain dynamic stability.
Stability margin is defined as the shortest distance from ZMP to the support polygon boundary:
Swing Leg Trajectory Planning
Bezier curves or parabolic arcs are commonly used to generate swing leg trajectories, satisfying:
- Sufficient lift height for obstacle clearance
- Minimal touchdown velocity (reduce impact)
- Smooth transitions (avoid joint velocity/acceleration discontinuities)
Reinforcement Learning Locomotion Control
Training Pipeline
graph LR
A[Simulation Environment<br/>Isaac Gym / MuJoCo] --> B[Parallel Sampling<br/>Thousands of Environment Instances]
B --> C[Policy Network<br/>MLP / GRU]
C --> D[PPO Update]
D --> B
C --> E[Domain Randomization]
E --> F[Sim-to-Real<br/>Deploy to Real Robot]
subgraph Reward Design
R1[Velocity Tracking]
R2[Energy Penalty]
R3[Posture Penalty]
R4[Foot Contact Pattern]
R5[Action Smoothness]
end
R1 --> D
R2 --> D
R3 --> D
R4 --> D
R5 --> D
Reward Function Design
Typical quadruped locomotion reward function:
reward = (
# Positive rewards
w_vel * exp(-||v_actual - v_cmd||^2 / sigma_v) # Velocity tracking
+ w_alive * 1.0 # Alive reward
# Penalties
- w_energy * sum(|tau * dq|) # Energy consumption
- w_torque * sum(tau^2) # Joint torques
- w_action * sum(|a_t - a_{t-1}|) # Action smoothness
- w_orient * ||euler_body||^2 # Body orientation deviation
- w_z * (z_body - z_target)^2 # Body height
- w_slip * sum(|v_foot| * f_contact)# Foot slippage
)
Teacher-Student Distillation Framework
Teacher policy: Has privileged information (precise terrain heightmap, friction coefficients, external forces, etc.), achieving optimal performance in simulation.
Student policy: Uses only real-world-available sensors (IMU, joint encoders, optional depth camera), learning teacher behavior through distillation.
Terrain Adaptation and Blind Locomotion
Blind Locomotion
Locomotion control relying only on proprioception (joint angles, IMU) without any vision/depth information. With sufficient domain randomization, blind policies can handle many terrains:
- Moderate slopes (< 25 degrees)
- Stairs (roughly known height)
- Gravel and uneven ground
Key insight: History information is critical for blind locomotion success. Using GRU/LSTM to process observation sequences implicitly estimates terrain features.
Vision-Aided Locomotion
Combining depth camera or LiDAR heightmaps enables handling more extreme terrain:
- Jumping across gaps
- Stepping stones
- Tall steps
Representative Platforms
| Platform | Developer | Weight | Features | Price/Positioning |
|---|---|---|---|---|
| Spot | Boston Dynamics | ~32 kg | Commercial-grade, Spot SDK, modular payloads | ~$75K, industrial inspection |
| Go2 | Unitree | ~15 kg | Consumer-grade, LiDAR included, open SDK | ~$1,600 starting |
| B2 | Unitree | ~60 kg | Industrial-grade, heavy payload, all-terrain | Industrial pricing |
| B2-W | Unitree | ~70 kg | Wheel-leg hybrid, balancing efficiency and obstacle crossing | Industrial pricing |
| ANYmal | ANYbotics (ETH) | ~50 kg | Industrial inspection, RL locomotion pioneer | Industrial pricing |
| Vision 60 | Ghost Robotics | ~51 kg | Military/security, IP67 protection | Defense pricing |
| DR01 | DeepRobotics | ~50 kg | Chinese quadruped, industrial inspection | Industrial pricing |
| CyberDog 2 | Xiaomi | ~8.9 kg | Consumer-grade, equipped with NX, open-source friendly | ~$3,000 |
Unitree Go2 Details
Go2 is currently the best value quadruped R&D platform:
- Computing platform: Jetson Orin NX (optional)
- Sensors: 3D LiDAR, front depth camera, ultra-wide-angle camera
- Battery life: ~1-2 hours
- SDK: C++/Python SDK provided, supports low-level joint control
- Community: Numerous open-source projects based on the Go2 platform
Milestone Achievements
ETH "Learning Agile Motor Skills" (2019)
- First demonstration of RL-learned agile locomotion on a real quadruped (ANYmal)
- Actuator network models motor dynamics
- Direct sim-to-real transfer without fine-tuning
CMU "Extreme Parkour" (2023)
- Extreme parkour on Unitree A1: high platform jumps, gap crossing, obstacle leaping
- Visual input + RL policy
- Curriculum learning progressively increases obstacle difficulty
- Demonstrates the upper bound of RL quadruped locomotion
ETH "Legged Gym" / NVIDIA Isaac Lab
- Open-source quadruped/humanoid RL training framework
- Supports thousands of parallel simulation environments (GPU accelerated)
- Has become the standard infrastructure for legged robot RL research
timeline
title Quadruped Robot RL Locomotion Key Milestones
2017 : ETH ANYmal First Sim-to-Real
2019 : "Learning Agile Motor Skills"<br/>Agile Locomotion Control
2020 : Legged Gym Open-sourced
2021 : RMA Adaptive Locomotion<br/>Implicit Terrain Estimation
2022 : Blind Locomotion Over Difficult Terrain<br/>Teacher-Student Distillation
2023 : CMU Extreme Parkour<br/>Extreme Agility
2024 : Isaac Lab Released<br/>Unified Training Platform
Control Architecture Overview
graph TB
subgraph High_Level_Planning["High-Level Planning"]
A[Task Goal] --> B[Path Planning<br/>A*/RRT]
B --> C[Velocity Command<br/>vx, vy, yaw_rate]
end
subgraph Mid_Level_Policy["Mid-Level Policy"]
C --> D{Policy Type}
D -->|Traditional| E[CPG + Model Control]
D -->|Learning| F[RL Policy Network]
E --> G[Foot Trajectory]
F --> G
end
subgraph Low_Level_Control["Low-Level Control"]
G --> H[Inverse Kinematics]
H --> I[Joint PD Controller]
I --> J[Motor Driver]
end
subgraph Perception["Perception"]
K[IMU] --> F
L[Joint Encoders] --> F
M[Depth Camera] --> F
N[LiDAR] --> B
end
J --> O[Quadruped Robot]
O --> K
O --> L
Further Reading
- Bellicoso et al., "Dynamic Locomotion Through Online Nonlinear Motion Optimization for Quadrupedal Robots", IEEE RA-L, 2018
- Hwangbo et al., "Learning Agile and Dynamic Motor Skills for Legged Robots", Science Robotics, 2019
- Kumar et al., "RMA: Rapid Motor Adaptation for Legged Robots", RSS, 2021
- Zhuang et al., "Robot Parkour Learning", CoRL, 2023
Related Notes: