Unmanned Aerial Vehicles (UAVs)

Overview

Unmanned Aerial Vehicles (UAVs) are an important carrier of embodied intelligence in three-dimensional space. From agricultural crop spraying to logistics delivery, from aerial cinematography to military reconnaissance, drones are profoundly transforming multiple industries. Autonomous drones represent the high integration of perception, planning, and control — real-time decision-making during high-speed three-dimensional motion.

UAV Classification

Rotorcraft, Fixed-Wing, and VTOL

Type	Representative	Advantages	Disadvantages	Typical Applications
Multirotor	DJI Mavic, Crazyflie	Hover, low-speed maneuver, simple	Short endurance, low efficiency	Aerial photography, inspection
Fixed-wing	senseFly eBee	Long endurance, high speed, efficient	Cannot hover, needs runway/catapult	Mapping, long-range inspection
VTOL	Wingtra	Combines hover and cruise	Mechanically complex, difficult transition	Logistics, long-range inspection
Single-rotor	Traditional helicopter	Heavy payload, long endurance	Mechanically complex, high vibration	Agricultural spraying

Quadrotor as Research Platform

Quadrotors are the most commonly used research platform because:

Simple mechanical structure (4 motors + propellers)
Analytically derivable dynamics model
Easy miniaturization (indoor flight)
Rich open-source ecosystem

Quadrotor Dynamics

Coordinate System Definitions

World frame \(\{W\}\): NED (North-East-Down) or ENU (East-North-Up)
Body frame \(\{B\}\): Origin at center of mass, \(x\) forward, \(z\) upward

Rotation matrix \(R \in SO(3)\) transforms from body frame to world frame.

Newton-Euler Equations

Translational equation:

\[ m\ddot{\mathbf{p}} = \begin{bmatrix} 0 \\ 0 \\ -mg \end{bmatrix} + R \begin{bmatrix} 0 \\ 0 \\ T \end{bmatrix} \]

where \(\mathbf{p} = [x, y, z]^T\) is the position in world frame, \(T = \sum_{i=1}^{4} f_i\) is total thrust.

Rotational equation:

\[ J\dot{\boldsymbol{\omega}} = -\boldsymbol{\omega} \times J\boldsymbol{\omega} + \boldsymbol{\tau} \]

where \(J\) is the inertia tensor and \(\boldsymbol{\omega}\) is the body angular velocity.

Thrust and Torque Allocation

The thrust \(f_i\) and rotational speed \(\omega_i\) relationship for four motors:

\[ f_i = k_f \omega_i^2, \quad \tau_i = k_m \omega_i^2 \]

Total thrust and torques:

\[ \begin{bmatrix} T \\ \tau_\phi \\ \tau_\theta \\ \tau_\psi \end{bmatrix} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & -L & 0 & L \\ L & 0 & -L & 0 \\ -k_m/k_f & k_m/k_f & -k_m/k_f & k_m/k_f \end{bmatrix} \begin{bmatrix} f_1 \\ f_2 \\ f_3 \\ f_4 \end{bmatrix} \]

where \(L\) is the distance from motor to center of mass. This allocation matrix is invertible, so individual motor speeds can be solved from desired thrust and torques.

Differential Flatness: The quadrotor system is differentially flat — all states and inputs can be expressed using flat outputs \([x, y, z, \psi]^T\) (position + yaw angle) and their derivatives. This means only the position trajectory needs to be planned to derive complete states and control inputs.

Control Architecture

PX4 / ArduPilot Flight Stack

graph TB
    subgraph Ground_Station["Ground Station"]
        GCS[QGroundControl / Mission Planner]
    end

    subgraph Companion_Computer["Companion Computer"]
        COMP[Jetson / RPI / NUC] --> VIO[Visual Odometry]
        COMP --> DET[Object Detection]
        COMP --> PLAN[Path Planning]
    end

    subgraph Flight_Controller["Flight Controller PX4/ArduPilot"]
        POS[Position Controller<br/>PID] --> ATT[Attitude Controller<br/>PID / Quaternion]
        ATT --> RATE[Rate Controller<br/>PID]
        RATE --> MIX[Motor Mixer]
        MIX --> ESC[ESC]

        EKF[Extended Kalman Filter<br/>State Estimation] --> POS

        IMU[IMU] --> EKF
        BARO[Barometer] --> EKF
        GPS[GPS] --> EKF
        MAG[Magnetometer] --> EKF
    end

    GCS -->|MAVLink| COMP
    COMP -->|MAVLink| POS
    VIO -->|Pose| EKF
    ESC --> M1[Motors 1-4]

Cascaded PID Controller

PX4's default controller is a three-layer cascaded PID:

Outer loop — Position control:

\[ \mathbf{a}_{cmd} = K_p^{pos}(\mathbf{p}_{ref} - \mathbf{p}) + K_d^{pos}(\mathbf{v}_{ref} - \mathbf{v}) + K_i^{pos} \int (\mathbf{p}_{ref} - \mathbf{p}) dt + \mathbf{g} \]

Middle loop — Attitude control: Extract desired attitude from desired acceleration, then compute desired angular velocity.

Inner loop — Rate control:

\[ \boldsymbol{\tau}_{cmd} = K_p^{rate}(\boldsymbol{\omega}_{ref} - \boldsymbol{\omega}) + K_d^{rate}\dot{\boldsymbol{\omega}} + K_i^{rate}\int(\boldsymbol{\omega}_{ref} - \boldsymbol{\omega})dt \]

Advanced Control Methods

Model Predictive Control (MPC): Optimizes control sequences within a finite time horizon, handling constraints:

\[ \min_{u_{0:N-1}} \sum_{k=0}^{N} \|\mathbf{x}_k - \mathbf{x}_{ref}\|_Q^2 + \sum_{k=0}^{N-1} \|\mathbf{u}_k\|_R^2 \]

\[ \text{s.t.} \quad \mathbf{x}_{k+1} = f(\mathbf{x}_k, \mathbf{u}_k), \quad \mathbf{u}_{min} \leq \mathbf{u}_k \leq \mathbf{u}_{max} \]

SE(3) Geometric Control: Designs controllers directly on the \(SE(3)\) Lie group, avoiding gimbal lock, suitable for large-angle maneuvers.

In indoor or tunnel environments where GPS is unavailable, drones must rely on onboard sensors for localization:

Visual-Inertial Odometry (VIO): Fuses camera and IMU data for pose estimation - VINS-Mono / VINS-Fusion: Open-source VIO system from HKUST - MSCKF: Multi-State Constraint Kalman Filter - ORB-SLAM3: Supports VIO mode

LiDAR SLAM: - LOAM / LIO-SAM: LiDAR odometry + IMU tight coupling - High accuracy but heavy sensors, suitable for larger drones

Motion Planning

UAV path planning must consider: - Dynamic constraints (max velocity, acceleration, angular rate) - Collision avoidance (static + dynamic obstacles) - Energy optimization (minimize total thrust variation)

Common methods:

Minimum snap trajectory: Minimizes fourth-order derivative (snap), ensuring smooth trajectories:

\[ \min \int_0^T \left\|\frac{d^4 \mathbf{p}}{dt^4}\right\|^2 dt \]

EGO-Planner: ESDF (Euclidean Distance Field) gradient-based planning, good real-time performance
Fast-Planner: Open-source fast motion planning system from ZJU

Learning-based Agile Flight

UZH RPG Group's Work

The Robotics and Perception Group (RPG) at the University of Zurich achieved breakthrough results in learning-based agile flight:

Swift (2023): RL-trained drones beat human champion pilots in racing - Trained with PPO in simulation - Visual perception via RGB camera + gate detection - Achieved speeds and accelerations beyond human limits

Agile Autonomy (2021): End-to-end learned high-speed obstacle avoidance - Depth image input -> trajectory point output - Flying at 10 m/s through dense forests

Sim-to-Real for UAVs

UAV RL Sim-to-Real must consider: - Aerodynamic effects (rotor wake, ground effect) - Motor response delay - IMU noise and bias - Communication latency

Aerial Manipulation

UAV Grasping

Aerial manipulation combines drones with robot arms for airborne grasping and manipulation: - Challenge: Grasping-induced external forces/torques severely affect flight stability - Solutions: Over-actuated platforms or adaptive control - Applications: High-altitude inspection, hazardous material handling, construction

Multi-UAV Cooperative Transport

Multiple drones cooperatively transport large objects via cables or rigid connections:

\[ \sum_{i=1}^{N} \mathbf{f}_i + m_L \mathbf{g} = m_L \ddot{\mathbf{p}}_L \]

Requires distributed control and communication.

Swarm Intelligence

Multi-Agent Coordination

Core problems in UAV swarms: - Formation control: Maintain predetermined geometric configurations - Collision avoidance: Prevent intra-swarm collisions - Task allocation: Multi-robot division of labor

Reynolds Rules (inspired by bird flocks): 1. Separation: Avoid getting too close to neighbors 2. Alignment: Match velocity direction with neighbors 3. Cohesion: Move toward the center of neighbors

Mathematical formulation:

\[ \mathbf{u}_i = c_s \sum_{j \in \mathcal{N}_i} \frac{\mathbf{p}_i - \mathbf{p}_j}{\|\mathbf{p}_i - \mathbf{p}_j\|^2} + c_a \frac{1}{|\mathcal{N}_i|}\sum_{j \in \mathcal{N}_i} \mathbf{v}_j + c_c \left(\frac{1}{|\mathcal{N}_i|}\sum_{j \in \mathcal{N}_i} \mathbf{p}_j - \mathbf{p}_i\right) \]

Communication and Decentralization

Centralized: All drones report to a central node which issues commands. Simple but single point of failure.
Decentralized: Each drone communicates only with neighbors. Robust but coordination is difficult.
Hierarchical: Leader-follower structure, a compromise.

Representative Swarm Systems

Crazyswarm2: Swarm research platform based on Crazyflie 2.1 micro quadrotors
ZJU/HKUST Gao Fei Team: Large-scale swarm flying through dense forests
EHang: Passenger AAM (Advanced Air Mobility) formation performances

Open-Source R&D Platforms

Platform	Size	Features	Suitable Scenarios
Crazyflie 2.1	27g	Ultra-light micro, Python/ROS, swarm-friendly	Indoor research/teaching
PX4 + QAV250	~400g	Standard racing frame + PX4 flight controller	Outdoor autonomous flight
DJI RoboMaster TT	87g	Tello EDU upgrade, programming interface	Education/entry-level
Agilicious (UZH)	~850g	Designed for agile flight research	High-speed/racing research
Flightmare (simulation)	-	UZH RPG open-source simulator, Unity rendering	RL training
AirSim / Colosseum	-	Microsoft open-source simulation, Unreal rendering	Visual navigation research

References

Mellinger & Kumar, "Minimum Snap Trajectory Generation and Control for Quadrotors", ICRA, 2011
Lee et al., "Geometric Tracking Control of a Quadrotor UAV on SE(3)", CDC, 2010
Song et al., "Reaching the Limit in Autonomous Racing: Optimal Control Meets Reinforcement Learning", Science Robotics, 2023
Loquercio et al., "Learning High-Speed Flight in the Wild", Science Robotics, 2021

Related Notes: