Skip to content

Visual-Inertial Odometry (VIO)

Overview

Visual-Inertial Odometry (VIO) is the technology of fusing camera and IMU data for real-time pose estimation. Cameras provide rich environmental texture information but have limited frame rates, while IMUs provide high-frequency motion information but suffer from drift -- the two complement each other to achieve robust 6-DOF pose tracking.

Why VIO Is Needed

Sensor Strengths Limitations
Camera Rich texture, no drift (loop closure) Low frame rate (30Hz), motion blur, light-sensitive
IMU High frequency (200-1000Hz), unaffected by lighting Long-term drift, requires integration
VIO Fusion High-frequency + high-accuracy, robust Increased system complexity

Fusion Method Classification

graph TD
    A[VIO Fusion Methods] --> B[Loosely Coupled]
    A --> C[Tightly Coupled]

    B --> B1[Visual odometry runs independently<br>IMU runs independently<br>Backend fuses both results]

    C --> C1[Visual features and IMU data<br>in the same optimization framework<br>jointly optimized]

    A --> D[By Backend Type]
    D --> D1[Filter-Based<br>EKF / MSCKF]
    D --> D2[Optimization-Based<br>Bundle Adjustment + IMU]

Loosely vs. Tightly Coupled

Feature Loosely Coupled Tightly Coupled
Fusion Level Result-level fusion Raw data-level fusion
Accuracy Lower Higher
Robustness Degrades when single sensor fails More robust (IMU fills in when features are sparse)
Implementation Complexity Low High
Representative - VINS-Mono, OKVIS, BASALT

Tight Coupling Is Mainstream

Current mainstream VIO approaches in both academia and industry use tight coupling. Tight coupling better handles degraded scenarios (e.g., sparse textures, fast motion) and is the superior technical choice.

IMU Preintegration Theory

Motivation

In optimization-based VIO, after each optimization iteration updates the state estimate, IMU data needs to be re-integrated. IMU preintegration pre-integrates IMU measurements between two keyframes into relative motion constraints, avoiding repeated integration.

Preintegration Quantities

Preintegration quantities between keyframes \(i\) and \(j\):

Rotation preintegration:

\[ \Delta \mathbf{R}_{ij} = \prod_{k=i}^{j-1} \text{Exp}\left((\tilde{\boldsymbol{\omega}}_k - \mathbf{b}_g) \Delta t\right) \]

Velocity preintegration:

\[ \Delta \mathbf{v}_{ij} = \sum_{k=i}^{j-1} \Delta \mathbf{R}_{ik} \cdot (\tilde{\mathbf{a}}_k - \mathbf{b}_a) \Delta t \]

Position preintegration:

\[ \Delta \mathbf{p}_{ij} = \sum_{k=i}^{j-1} \left[ \Delta \mathbf{v}_{ik} \Delta t + \frac{1}{2} \Delta \mathbf{R}_{ik} \cdot (\tilde{\mathbf{a}}_k - \mathbf{b}_a) \Delta t^2 \right] \]

Where:

  • \(\tilde{\boldsymbol{\omega}}_k\) is the gyroscope measurement
  • \(\tilde{\mathbf{a}}_k\) is the accelerometer measurement
  • \(\mathbf{b}_g\), \(\mathbf{b}_a\) are biases
  • \(\text{Exp}(\cdot)\) is the exponential map from \(\mathfrak{so}(3) \to SO(3)\)

Core Advantage of Preintegration

When bias estimates \(\mathbf{b}_g\), \(\mathbf{b}_a\) change slightly, preintegration quantities can be updated through first-order approximation without re-integration:

\[ \Delta \mathbf{R}_{ij}(\mathbf{b}_g + \delta \mathbf{b}_g) \approx \Delta \mathbf{R}_{ij}(\mathbf{b}_g) \cdot \text{Exp}\left(\frac{\partial \Delta \mathbf{R}_{ij}}{\partial \mathbf{b}_g} \delta \mathbf{b}_g\right) \]

This greatly reduces computation during optimization iterations.

Preintegration Covariance Propagation

Uncertainty of preintegration quantities is propagated through the covariance matrix:

\[ \boldsymbol{\Sigma}_{k+1} = \mathbf{A}_k \boldsymbol{\Sigma}_k \mathbf{A}_k^T + \mathbf{B}_k \mathbf{Q} \mathbf{B}_k^T \]

Where \(\mathbf{A}_k\) is the state transition matrix and \(\mathbf{Q}\) is the IMU noise covariance.

Factor Graph Formulation

VIO can be naturally modeled as a factor graph optimization problem:

graph LR
    subgraph "State Variables"
    X0["X0<br>(p,R,v,b)"] 
    X1["X1<br>(p,R,v,b)"]
    X2["X2<br>(p,R,v,b)"]
    end

    subgraph "Landmarks"
    L1["L1"] 
    L2["L2"]
    L3["L3"]
    end

    X0 --- |"IMU<br>Preintegration"| X1
    X1 --- |"IMU<br>Preintegration"| X2

    X0 --- |"Visual<br>Observation"| L1
    X0 --- |"Visual<br>Observation"| L2
    X1 --- |"Visual<br>Observation"| L1
    X1 --- |"Visual<br>Observation"| L3
    X2 --- |"Visual<br>Observation"| L2
    X2 --- |"Visual<br>Observation"| L3

    P["Prior Factor"] --- X0

State Vector

State at each keyframe:

\[ \mathbf{x}_k = \begin{bmatrix} \mathbf{p}_k \\ \mathbf{R}_k \\ \mathbf{v}_k \\ \mathbf{b}_{a,k} \\ \mathbf{b}_{g,k} \end{bmatrix} \in \mathbb{R}^3 \times SO(3) \times \mathbb{R}^3 \times \mathbb{R}^3 \times \mathbb{R}^3 \]

Optimization Objective

\[ \mathbf{x}^* = \arg\min_{\mathbf{x}} \left\{ \| \mathbf{r}_p \|^2 + \sum_{k} \| \mathbf{r}_{\text{IMU}}^k \|_{\Sigma_k}^2 + \sum_{(i,j)} \| \mathbf{r}_{\text{vis}}^{ij} \|_{\Sigma_{ij}}^2 \right\} \]

Where:

  • \(\mathbf{r}_p\) is the prior residual
  • \(\mathbf{r}_{\text{IMU}}^k\) is the IMU preintegration residual
  • \(\mathbf{r}_{\text{vis}}^{ij}\) is the visual reprojection residual
  • \(\| \cdot \|_{\Sigma}^2 = (\cdot)^T \Sigma^{-1} (\cdot)\) is the Mahalanobis distance

Classic VIO Systems

VINS-Mono

A classic tightly-coupled monocular VIO system open-sourced by HKUST, the standard reference for learning VIO.

System Architecture:

graph TD
    A[IMU Data] --> B[Preintegration]
    C[Images] --> D[Feature Tracking<br>KLT Optical Flow]

    B --> E[Initialization]
    D --> E

    E --> F[Nonlinear Optimization<br>Sliding Window]
    F --> G[Marginalization]

    F --> H[Loop Closure Detection<br>DBoW2]
    H --> I[Global Pose Graph Optimization]

    F --> J[Output Pose]
    I --> J

Key Techniques:

Module Method
Frontend KLT optical flow feature tracking
Initialization Visual SfM + IMU alignment
Backend Sliding window + marginalization
Loop Closure DBoW2 bag-of-words detection + pose graph optimization
Optimizer Ceres Solver

OKVIS

Open-sourced by ETH, a keyframe-based tightly-coupled VIO supporting stereo.

MSCKF

An efficient EKF-based VIO that avoids maintaining feature point states through Multi-State Constraint:

\[ \mathbf{x} = \begin{bmatrix} \mathbf{x}_{\text{IMU}} \\ \mathbf{x}_{C_1} \\ \mathbf{x}_{C_2} \\ \vdots \end{bmatrix} \]

Feature points are not added to the state after observation; instead, multi-state constraints are built when tracking is lost.

BASALT

Open-sourced by TUM, a high-accuracy VIO using stereo + IMU, featuring efficient nonlinear optimization and an optical flow frontend.

System Comparison

System Coupling Backend Camera Config Highlight
VINS-Mono Tight Optimization Monocular Complete system, loop closure
VINS-Fusion Tight Optimization Mono/Stereo/Multi Multi-sensor
OKVIS Tight Optimization Stereo Keyframe management
MSCKF Tight EKF Stereo Computationally efficient
BASALT Tight Optimization Stereo High accuracy
ORB-SLAM3 Tight Optimization Mono/Stereo/RGB-D Map reuse

Practical Deployment Considerations

Camera-IMU Calibration

The extrinsic \(\mathbf{T}_{CI}\) (relative pose) between camera and IMU must be precisely calibrated:

# Using Kalibr
rosrun kalibr kalibr_calibrate_imu_camera \
    --bag calibration.bag \
    --cam cam_config.yaml \
    --imu imu_config.yaml \
    --target april_grid.yaml

Time Synchronization

The time offset \(t_d\) between camera and IMU severely affects accuracy:

\[ t_{\text{cam}} = t_{\text{IMU}} + t_d \]

VINS-Mono supports online estimation of \(t_d\), but hardware-level time synchronization (e.g., trigger signals) is more reliable.

Common Failure Scenarios

Scenario Cause Mitigation
Texture-poor areas White walls, sky Rely on IMU short-term prediction
Fast rotation Motion blur High frame rate camera, global shutter
Drastic lighting changes Feature tracking failure HDR camera, robust features
Long corridors Degenerate motion Multi-sensor fusion
Pure rotation Monocular scale unobservable Stereo or IMU constraints

For more SLAM-related content, see SLAM Topic

References

  • Qin et al., "VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator", IEEE Trans. Robotics, 2018
  • Forster et al., "On-Manifold Preintegration for Real-Time Visual-Inertial Odometry", IEEE Trans. Robotics, 2017
  • Mourikis & Roumeliotis, "A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation", ICRA 2007
  • Sola, "Quaternion kinematics for the error-state Kalman filter", 2017
  • Kalibr Calibration Tool: https://github.com/ethz-asl/kalibr

评论 #