Skip to content

Quadruped Robots

Overview

Quadruped robots are among the most mature legged robot forms today. Compared to bipedal robots, quadrupeds have inherent static stability (the center of mass can remain within the support polygon when three or more legs are on the ground), making movement on unstructured terrain more reliable. In recent years, the introduction of reinforcement learning has evolved quadrupeds from only being able to walk slowly to performing extreme parkour maneuvers.

Why Choose the Quadruped Form

  • Stability: Four legs provide a larger support polygon, inherently more stable than bipedal
  • Payload capacity: Horizontal torso placement is suitable for carrying sensors and tools
  • Terrain adaptation: Can cross gaps, climb slopes, and traverse gravel
  • Maturity: Mature solutions exist from control theory to RL training

Gait Fundamentals

Gait Patterns

Quadruped gaits are defined by the phase relationships of four legs. Each leg has two states: stance phase and swing phase.

Gait Legs on Ground Duty Factor Characteristics Speed
Walk 3 ~75% Always three feet on ground, statically stable Slowest
Trot 2 ~50% Diagonal legs synchronized, most common gait Medium
Pace 2 ~50% Ipsilateral legs synchronized, large lateral sway Medium
Bound 0-2 ~30% Front/rear leg pairs synchronized, has flight phase Fast
Gallop 0-3 ~20-30% Four legs touch down sequentially, has flight phase Fastest

Phase representation of gaits: The swing onset time of each leg relative to the gait period \(T\) is called the phase offset \(\phi_i\):

\[ \text{Trot:} \quad \phi = [0, 0.5, 0.5, 0] \quad \text{(left-front, right-front, left-hind, right-hind)} \]
\[ \text{Walk:} \quad \phi = [0, 0.5, 0.75, 0.25] \]

Central Pattern Generator (CPG)

CPG is a biologically inspired rhythmic motion generation model. It generates coordinated limb movement patterns through coupled oscillator networks without continuous high-level commands.

Hopf oscillator model:

\[ \dot{x}_i = \alpha(\mu - r_i^2)x_i - \omega_i y_i + \sum_j w_{ij} x_j \]
\[ \dot{y}_i = \alpha(\mu - r_i^2)y_i + \omega_i x_i + \sum_j w_{ij} y_j \]

where:

  • \(r_i = \sqrt{x_i^2 + y_i^2}\): amplitude
  • \(\mu\): controls limit cycle radius
  • \(\omega_i\): angular frequency, controls gait frequency
  • \(w_{ij}\): coupling weights, determine inter-leg phase relationships
  • \(\alpha\): convergence rate

The coupling matrix defines gait patterns. For Trot:

\[ W_{trot} = \begin{bmatrix} 0 & -1 & -1 & 1 \\ -1 & 0 & 1 & -1 \\ -1 & 1 & 0 & -1 \\ 1 & -1 & -1 & 0 \end{bmatrix} \]

Positive coupling indicates in-phase, negative coupling indicates anti-phase.

graph TD
    subgraph CPG_Network["CPG Network"]
        LF[Left-Front Oscillator] -->|Anti-phase| RF[Right-Front Oscillator]
        LF -->|Anti-phase| LH[Left-Hind Oscillator]
        LF -->|In-phase| RH[Right-Hind Oscillator]
        RF -->|In-phase| LH
        RF -->|Anti-phase| RH
        LH -->|Anti-phase| RH
    end

    subgraph Output["Output"]
        LF --> LF_joint[Left-Front Joint Trajectory]
        RF --> RF_joint[Right-Front Joint Trajectory]
        LH --> LH_joint[Left-Hind Joint Trajectory]
        RH --> RH_joint[Right-Hind Joint Trajectory]
    end

    CMD[Velocity Command] --> LF
    CMD --> RF
    CMD --> LH
    CMD --> RH

Stability Analysis

ZMP and Support Polygon

For quadrupeds, the ZMP (Zero Moment Point) must lie within the support polygon formed by the ground contact feet for the robot to maintain dynamic stability.

\[ \mathbf{p}_{ZMP} = \frac{\sum_i m_i(\ddot{z}_i + g)\mathbf{r}_i - \sum_i m_i \ddot{\mathbf{r}}_i z_i}{\sum_i m_i(\ddot{z}_i + g)} \]

Stability margin is defined as the shortest distance from ZMP to the support polygon boundary:

\[ SM = \min_{e \in \text{edges}} d(\mathbf{p}_{ZMP}, e) \]

Swing Leg Trajectory Planning

Bezier curves or parabolic arcs are commonly used to generate swing leg trajectories, satisfying:

  • Sufficient lift height for obstacle clearance
  • Minimal touchdown velocity (reduce impact)
  • Smooth transitions (avoid joint velocity/acceleration discontinuities)

Reinforcement Learning Locomotion Control

Training Pipeline

graph LR
    A[Simulation Environment<br/>Isaac Gym / MuJoCo] --> B[Parallel Sampling<br/>Thousands of Environment Instances]
    B --> C[Policy Network<br/>MLP / GRU]
    C --> D[PPO Update]
    D --> B

    C --> E[Domain Randomization]
    E --> F[Sim-to-Real<br/>Deploy to Real Robot]

    subgraph Reward Design
        R1[Velocity Tracking] 
        R2[Energy Penalty]
        R3[Posture Penalty]
        R4[Foot Contact Pattern]
        R5[Action Smoothness]
    end

    R1 --> D
    R2 --> D
    R3 --> D
    R4 --> D
    R5 --> D

Reward Function Design

Typical quadruped locomotion reward function:

reward = (
    # Positive rewards
    w_vel * exp(-||v_actual - v_cmd||^2 / sigma_v)   # Velocity tracking
    + w_alive * 1.0                                     # Alive reward

    # Penalties
    - w_energy * sum(|tau * dq|)        # Energy consumption
    - w_torque * sum(tau^2)             # Joint torques
    - w_action * sum(|a_t - a_{t-1}|)  # Action smoothness
    - w_orient * ||euler_body||^2       # Body orientation deviation
    - w_z * (z_body - z_target)^2       # Body height
    - w_slip * sum(|v_foot| * f_contact)# Foot slippage
)

Teacher-Student Distillation Framework

Teacher policy: Has privileged information (precise terrain heightmap, friction coefficients, external forces, etc.), achieving optimal performance in simulation.

Student policy: Uses only real-world-available sensors (IMU, joint encoders, optional depth camera), learning teacher behavior through distillation.

\[ \mathcal{L}_{distill} = \mathbb{E}\left[\|{\pi_{student}(o_t) - \pi_{teacher}(o_t, o_t^{priv})}\|^2\right] \]

Terrain Adaptation and Blind Locomotion

Blind Locomotion

Locomotion control relying only on proprioception (joint angles, IMU) without any vision/depth information. With sufficient domain randomization, blind policies can handle many terrains:

  • Moderate slopes (< 25 degrees)
  • Stairs (roughly known height)
  • Gravel and uneven ground

Key insight: History information is critical for blind locomotion success. Using GRU/LSTM to process observation sequences implicitly estimates terrain features.

Vision-Aided Locomotion

Combining depth camera or LiDAR heightmaps enables handling more extreme terrain:

  • Jumping across gaps
  • Stepping stones
  • Tall steps

Representative Platforms

Platform Developer Weight Features Price/Positioning
Spot Boston Dynamics ~32 kg Commercial-grade, Spot SDK, modular payloads ~$75K, industrial inspection
Go2 Unitree ~15 kg Consumer-grade, LiDAR included, open SDK ~$1,600 starting
B2 Unitree ~60 kg Industrial-grade, heavy payload, all-terrain Industrial pricing
B2-W Unitree ~70 kg Wheel-leg hybrid, balancing efficiency and obstacle crossing Industrial pricing
ANYmal ANYbotics (ETH) ~50 kg Industrial inspection, RL locomotion pioneer Industrial pricing
Vision 60 Ghost Robotics ~51 kg Military/security, IP67 protection Defense pricing
DR01 DeepRobotics ~50 kg Chinese quadruped, industrial inspection Industrial pricing
CyberDog 2 Xiaomi ~8.9 kg Consumer-grade, equipped with NX, open-source friendly ~$3,000

Unitree Go2 Details

Go2 is currently the best value quadruped R&D platform:

  • Computing platform: Jetson Orin NX (optional)
  • Sensors: 3D LiDAR, front depth camera, ultra-wide-angle camera
  • Battery life: ~1-2 hours
  • SDK: C++/Python SDK provided, supports low-level joint control
  • Community: Numerous open-source projects based on the Go2 platform

Milestone Achievements

ETH "Learning Agile Motor Skills" (2019)

  • First demonstration of RL-learned agile locomotion on a real quadruped (ANYmal)
  • Actuator network models motor dynamics
  • Direct sim-to-real transfer without fine-tuning

CMU "Extreme Parkour" (2023)

  • Extreme parkour on Unitree A1: high platform jumps, gap crossing, obstacle leaping
  • Visual input + RL policy
  • Curriculum learning progressively increases obstacle difficulty
  • Demonstrates the upper bound of RL quadruped locomotion

ETH "Legged Gym" / NVIDIA Isaac Lab

  • Open-source quadruped/humanoid RL training framework
  • Supports thousands of parallel simulation environments (GPU accelerated)
  • Has become the standard infrastructure for legged robot RL research
timeline
    title Quadruped Robot RL Locomotion Key Milestones
    2017 : ETH ANYmal First Sim-to-Real
    2019 : "Learning Agile Motor Skills"<br/>Agile Locomotion Control
    2020 : Legged Gym Open-sourced
    2021 : RMA Adaptive Locomotion<br/>Implicit Terrain Estimation
    2022 : Blind Locomotion Over Difficult Terrain<br/>Teacher-Student Distillation
    2023 : CMU Extreme Parkour<br/>Extreme Agility
    2024 : Isaac Lab Released<br/>Unified Training Platform

Control Architecture Overview

graph TB
    subgraph High_Level_Planning["High-Level Planning"]
        A[Task Goal] --> B[Path Planning<br/>A*/RRT]
        B --> C[Velocity Command<br/>vx, vy, yaw_rate]
    end

    subgraph Mid_Level_Policy["Mid-Level Policy"]
        C --> D{Policy Type}
        D -->|Traditional| E[CPG + Model Control]
        D -->|Learning| F[RL Policy Network]
        E --> G[Foot Trajectory]
        F --> G
    end

    subgraph Low_Level_Control["Low-Level Control"]
        G --> H[Inverse Kinematics]
        H --> I[Joint PD Controller]
        I --> J[Motor Driver]
    end

    subgraph Perception["Perception"]
        K[IMU] --> F
        L[Joint Encoders] --> F
        M[Depth Camera] --> F
        N[LiDAR] --> B
    end

    J --> O[Quadruped Robot]
    O --> K
    O --> L

Further Reading

  • Bellicoso et al., "Dynamic Locomotion Through Online Nonlinear Motion Optimization for Quadrupedal Robots", IEEE RA-L, 2018
  • Hwangbo et al., "Learning Agile and Dynamic Motor Skills for Legged Robots", Science Robotics, 2019
  • Kumar et al., "RMA: Rapid Motor Adaptation for Legged Robots", RSS, 2021
  • Zhuang et al., "Robot Parkour Learning", CoRL, 2023

Related Notes:


评论 #