Quadruped Robots

Overview

Quadruped robots are among the most mature legged robot forms today. Compared to bipedal robots, quadrupeds have inherent static stability (the center of mass can remain within the support polygon when three or more legs are on the ground), making movement on unstructured terrain more reliable. In recent years, the introduction of reinforcement learning has evolved quadrupeds from only being able to walk slowly to performing extreme parkour maneuvers.

Why Choose the Quadruped Form

Stability: Four legs provide a larger support polygon, inherently more stable than bipedal
Payload capacity: Horizontal torso placement is suitable for carrying sensors and tools
Terrain adaptation: Can cross gaps, climb slopes, and traverse gravel
Maturity: Mature solutions exist from control theory to RL training

Gait Fundamentals

Gait Patterns

Quadruped gaits are defined by the phase relationships of four legs. Each leg has two states: stance phase and swing phase.

Gait	Legs on Ground	Duty Factor	Characteristics	Speed
Walk	3	~75%	Always three feet on ground, statically stable	Slowest
Trot	2	~50%	Diagonal legs synchronized, most common gait	Medium
Pace	2	~50%	Ipsilateral legs synchronized, large lateral sway	Medium
Bound	0-2	~30%	Front/rear leg pairs synchronized, has flight phase	Fast
Gallop	0-3	~20-30%	Four legs touch down sequentially, has flight phase	Fastest

Phase representation of gaits: The swing onset time of each leg relative to the gait period $T$ is called the phase offset $\phi_i$:

\[ \text{Trot:} \quad \phi = [0, 0.5, 0.5, 0] \quad \text{(left-front, right-front, left-hind, right-hind)} \]

\[ \text{Walk:} \quad \phi = [0, 0.5, 0.75, 0.25] \]

Central Pattern Generator (CPG)

CPG is a biologically inspired rhythmic motion generation model. It generates coordinated limb movement patterns through coupled oscillator networks without continuous high-level commands.

Hopf oscillator model:

\[ \dot{x}_i = \alpha(\mu - r_i^2)x_i - \omega_i y_i + \sum_j w_{ij} x_j \]

\[ \dot{y}_i = \alpha(\mu - r_i^2)y_i + \omega_i x_i + \sum_j w_{ij} y_j \]

where:

$r_i = \sqrt{x_i^2 + y_i^2}$: amplitude
$\mu$: controls limit cycle radius
$\omega_i$: angular frequency, controls gait frequency
$w_{ij}$: coupling weights, determine inter-leg phase relationships
$\alpha$: convergence rate

The coupling matrix defines gait patterns. For Trot:

\[ W_{trot} = \begin{bmatrix} 0 & -1 & -1 & 1 \\ -1 & 0 & 1 & -1 \\ -1 & 1 & 0 & -1 \\ 1 & -1 & -1 & 0 \end{bmatrix} \]

Positive coupling indicates in-phase, negative coupling indicates anti-phase.

graph TD
    subgraph CPG_Network["CPG Network"]
        LF[Left-Front Oscillator] -->|Anti-phase| RF[Right-Front Oscillator]
        LF -->|Anti-phase| LH[Left-Hind Oscillator]
        LF -->|In-phase| RH[Right-Hind Oscillator]
        RF -->|In-phase| LH
        RF -->|Anti-phase| RH
        LH -->|Anti-phase| RH
    end

    subgraph Output["Output"]
        LF --> LF_joint[Left-Front Joint Trajectory]
        RF --> RF_joint[Right-Front Joint Trajectory]
        LH --> LH_joint[Left-Hind Joint Trajectory]
        RH --> RH_joint[Right-Hind Joint Trajectory]
    end

    CMD[Velocity Command] --> LF
    CMD --> RF
    CMD --> LH
    CMD --> RH

Stability Analysis

ZMP and Support Polygon

For quadrupeds, the ZMP (Zero Moment Point) must lie within the support polygon formed by the ground contact feet for the robot to maintain dynamic stability.

\[ \mathbf{p}_{ZMP} = \frac{\sum_i m_i(\ddot{z}_i + g)\mathbf{r}_i - \sum_i m_i \ddot{\mathbf{r}}_i z_i}{\sum_i m_i(\ddot{z}_i + g)} \]

Stability margin is defined as the shortest distance from ZMP to the support polygon boundary:

\[ SM = \min_{e \in \text{edges}} d(\mathbf{p}_{ZMP}, e) \]

Swing Leg Trajectory Planning

Bezier curves or parabolic arcs are commonly used to generate swing leg trajectories, satisfying:

Sufficient lift height for obstacle clearance
Minimal touchdown velocity (reduce impact)
Smooth transitions (avoid joint velocity/acceleration discontinuities)

Reinforcement Learning Locomotion Control

Training Pipeline

graph LR
    A[Simulation Environment<br/>Isaac Gym / MuJoCo] --> B[Parallel Sampling<br/>Thousands of Environment Instances]
    B --> C[Policy Network<br/>MLP / GRU]
    C --> D[PPO Update]
    D --> B

    C --> E[Domain Randomization]
    E --> F[Sim-to-Real<br/>Deploy to Real Robot]

    subgraph Reward Design
        R1[Velocity Tracking] 
        R2[Energy Penalty]
        R3[Posture Penalty]
        R4[Foot Contact Pattern]
        R5[Action Smoothness]
    end

    R1 --> D
    R2 --> D
    R3 --> D
    R4 --> D
    R5 --> D

Reward Function Design

Typical quadruped locomotion reward function:

reward = (
    # Positive rewards
    w_vel * exp(-||v_actual - v_cmd||^2 / sigma_v)   # Velocity tracking
    + w_alive * 1.0                                     # Alive reward

    # Penalties
    - w_energy * sum(|tau * dq|)        # Energy consumption
    - w_torque * sum(tau^2)             # Joint torques
    - w_action * sum(|a_t - a_{t-1}|)  # Action smoothness
    - w_orient * ||euler_body||^2       # Body orientation deviation
    - w_z * (z_body - z_target)^2       # Body height
    - w_slip * sum(|v_foot| * f_contact)# Foot slippage
)

Teacher-Student Distillation Framework

Teacher policy: Has privileged information (precise terrain heightmap, friction coefficients, external forces, etc.), achieving optimal performance in simulation.

Student policy: Uses only real-world-available sensors (IMU, joint encoders, optional depth camera), learning teacher behavior through distillation.

\[ \mathcal{L}_{distill} = \mathbb{E}\left[\|{\pi_{student}(o_t) - \pi_{teacher}(o_t, o_t^{priv})}\|^2\right] \]

Locomotion control relying only on proprioception (joint angles, IMU) without any vision/depth information. With sufficient domain randomization, blind policies can handle many terrains:

Moderate slopes (< 25 degrees)
Stairs (roughly known height)
Gravel and uneven ground

Key insight: History information is critical for blind locomotion success. Using GRU/LSTM to process observation sequences implicitly estimates terrain features.

Vision-Aided Locomotion

Combining depth camera or LiDAR heightmaps enables handling more extreme terrain:

Jumping across gaps
Stepping stones
Tall steps

Representative Platforms

Platform	Developer	Weight	Features	Price/Positioning
Spot	Boston Dynamics	~32 kg	Commercial-grade, Spot SDK, modular payloads	~$75K, industrial inspection
Go2	Unitree	~15 kg	Consumer-grade, LiDAR included, open SDK	~$1,600 starting
B2	Unitree	~60 kg	Industrial-grade, heavy payload, all-terrain	Industrial pricing
B2-W	Unitree	~70 kg	Wheel-leg hybrid, balancing efficiency and obstacle crossing	Industrial pricing
ANYmal	ANYbotics (ETH)	~50 kg	Industrial inspection, RL locomotion pioneer	Industrial pricing
Vision 60	Ghost Robotics	~51 kg	Military/security, IP67 protection	Defense pricing
DR01	DeepRobotics	~50 kg	Chinese quadruped, industrial inspection	Industrial pricing
CyberDog 2	Xiaomi	~8.9 kg	Consumer-grade, equipped with NX, open-source friendly	~$3,000

Unitree Go2 Details

Go2 is currently the best value quadruped R&D platform:

Computing platform: Jetson Orin NX (optional)
Sensors: 3D LiDAR, front depth camera, ultra-wide-angle camera
Battery life: ~1-2 hours
SDK: C++/Python SDK provided, supports low-level joint control
Community: Numerous open-source projects based on the Go2 platform

Milestone Achievements

ETH "Learning Agile Motor Skills" (2019)

First demonstration of RL-learned agile locomotion on a real quadruped (ANYmal)
Actuator network models motor dynamics
Direct sim-to-real transfer without fine-tuning

CMU "Extreme Parkour" (2023)

Extreme parkour on Unitree A1: high platform jumps, gap crossing, obstacle leaping
Visual input + RL policy
Curriculum learning progressively increases obstacle difficulty
Demonstrates the upper bound of RL quadruped locomotion

ETH "Legged Gym" / NVIDIA Isaac Lab

Open-source quadruped/humanoid RL training framework
Supports thousands of parallel simulation environments (GPU accelerated)
Has become the standard infrastructure for legged robot RL research

timeline
    title Quadruped Robot RL Locomotion Key Milestones
    2017 : ETH ANYmal First Sim-to-Real
    2019 : "Learning Agile Motor Skills"<br/>Agile Locomotion Control
    2020 : Legged Gym Open-sourced
    2021 : RMA Adaptive Locomotion<br/>Implicit Terrain Estimation
    2022 : Blind Locomotion Over Difficult Terrain<br/>Teacher-Student Distillation
    2023 : CMU Extreme Parkour<br/>Extreme Agility
    2024 : Isaac Lab Released<br/>Unified Training Platform

Control Architecture Overview

graph TB
    subgraph High_Level_Planning["High-Level Planning"]
        A[Task Goal] --> B[Path Planning<br/>A*/RRT]
        B --> C[Velocity Command<br/>vx, vy, yaw_rate]
    end

    subgraph Mid_Level_Policy["Mid-Level Policy"]
        C --> D{Policy Type}
        D -->|Traditional| E[CPG + Model Control]
        D -->|Learning| F[RL Policy Network]
        E --> G[Foot Trajectory]
        F --> G
    end

    subgraph Low_Level_Control["Low-Level Control"]
        G --> H[Inverse Kinematics]
        H --> I[Joint PD Controller]
        I --> J[Motor Driver]
    end

    subgraph Perception["Perception"]
        K[IMU] --> F
        L[Joint Encoders] --> F
        M[Depth Camera] --> F
        N[LiDAR] --> B
    end

    J --> O[Quadruped Robot]
    O --> K
    O --> L