Skip to content

SLAM: Simultaneous Localization and Mapping

Overview

SLAM (Simultaneous Localization and Mapping) is the problem of simultaneously estimating a robot's own pose and constructing a map of the environment in an unknown setting. It is a core technology for autonomous navigation of mobile robots.

Probabilistic Formulation of the SLAM Problem:

\[ p(x_{1:t}, m \mid z_{1:t}, u_{1:t}) \]

where:

Symbol Meaning
\(x_{1:t}\) Robot pose sequence from time 1 to \(t\)
\(m\) Environment map
\(z_{1:t}\) Sensor observation sequence
\(u_{1:t}\) Control input (odometry) sequence
graph LR
    A["Sensor Data<br/>Camera/LiDAR/IMU"] --> B["Front End<br/>Data Association/Feature Extraction"]
    B --> C["State Estimation<br/>Filtering/Optimization"]
    C --> D["Back End<br/>Loop Closure/Global Optimization"]
    D --> E["Output<br/>Pose + Map"]
    D -->|"Loop Closure Constraints"| C
    style B fill:#e8f4fd,stroke:#2196F3
    style C fill:#fff3e0,stroke:#FF9800
    style D fill:#f3e5f5,stroke:#9C27B0

1 Mathematical Foundations of SLAM

1.1 Motion Model

\[ x_t = f(x_{t-1}, u_t) + w_t, \quad w_t \sim \mathcal{N}(0, R_t) \]

1.2 Observation Model

\[ z_t = h(x_t, m) + v_t, \quad v_t \sim \mathcal{N}(0, Q_t) \]

1.3 Two Forms of SLAM

Form Estimated Quantity Methods
Online SLAM \(p(x_t, m \mid z_{1:t}, u_{1:t})\) EKF-SLAM, particle filter
Full SLAM \(p(x_{1:t}, m \mid z_{1:t}, u_{1:t})\) Graph optimization

2 Filter-Based SLAM

2.1 EKF-SLAM

EKF-SLAM is the most classic SLAM method, maintaining a joint Gaussian distribution over the robot pose and all landmark positions.

State Vector:

\[ \hat{\mu}_t = \begin{bmatrix} x_t \\ m_1 \\ m_2 \\ \vdots \\ m_N \end{bmatrix}, \quad \Sigma_t = \begin{bmatrix} \Sigma_{xx} & \Sigma_{xm_1} & \cdots & \Sigma_{xm_N} \\ \Sigma_{m_1 x} & \Sigma_{m_1 m_1} & \cdots & \Sigma_{m_1 m_N} \\ \vdots & & \ddots & \vdots \\ \Sigma_{m_N x} & & \cdots & \Sigma_{m_N m_N} \end{bmatrix} \]

Prediction Step (Motion Update):

\[ \bar{\mu}_t = f(\hat{\mu}_{t-1}, u_t) \]
\[ \bar{\Sigma}_t = F_t \Sigma_{t-1} F_t^T + R_t \]

Update Step (Observation Update):

\[ K_t = \bar{\Sigma}_t H_t^T (H_t \bar{\Sigma}_t H_t^T + Q_t)^{-1} \]
\[ \hat{\mu}_t = \bar{\mu}_t + K_t (z_t - h(\bar{\mu}_t)) \]
\[ \Sigma_t = (I - K_t H_t) \bar{\Sigma}_t \]

where \(F_t = \frac{\partial f}{\partial x}\big|_{\hat{\mu}_{t-1}}\) and \(H_t = \frac{\partial h}{\partial x}\big|_{\bar{\mu}_t}\) are the Jacobian matrices.

Limitations of EKF-SLAM:

  • Computational complexity \(O(N^2)\) (\(N\) = number of landmarks), unsuitable for large-scale environments
  • Linearization errors lead to inconsistency
  • Data association errors are difficult to recover from

2.2 Particle Filter SLAM (FastSLAM)

FastSLAM (Montemerlo et al., 2002) uses Rao-Blackwellized particle filtering to decompose the SLAM problem:

\[ p(x_{1:t}, m \mid z_{1:t}, u_{1:t}) = p(x_{1:t} \mid z_{1:t}, u_{1:t}) \prod_{j=1}^{N} p(m_j \mid x_{1:t}, z_{1:t}) \]
  • Particles represent the path posterior \(p(x_{1:t} \mid \cdot)\)
  • Each particle maintains an independent map (EKF update for each landmark)

Advantages:

  • Computational complexity \(O(M \log N)\) (\(M\) = number of particles, \(N\) = number of landmarks)
  • Naturally handles multimodal distributions
  • Data association can be handled independently per particle

3 Visual SLAM

3.1 Overview

Visual SLAM uses cameras as the primary sensor, classified into:

  • Feature-based: Extract feature points, estimate pose based on geometric constraints
  • Direct: Directly use pixel intensity values to optimize pose
  • Semi-direct: Combines advantages of both (e.g., SVO)

3.2 ORB-SLAM3 System

ORB-SLAM3 (Campos et al., 2021) is currently the most complete open-source visual SLAM system, supporting monocular, stereo, RGB-D, and visual-inertial modes.

System Pipeline:

graph TB
    A["Image Input"] --> B["ORB Feature Extraction"]
    B --> C["Feature Matching"]
    C --> D["Tracking Thread<br/>(Frame-to-Frame Pose Estimation)"]
    D --> E["Keyframe Decision"]
    E -->|"Is Keyframe"| F["Local Mapping Thread<br/>(Local BA)"]
    F --> G["Loop Closure Thread<br/>(DBoW2)"]
    G -->|"Loop Detected"| H["Pose Graph Optimization"]
    H --> I["Global BA"]
    D --> J["Pose Output"]
    F --> K["Map Point Cloud"]
    style D fill:#e8f4fd,stroke:#2196F3
    style F fill:#fff3e0,stroke:#FF9800
    style G fill:#f3e5f5,stroke:#9C27B0

Core Modules:

1. Feature Extraction

Uses ORB (Oriented FAST and Rotated BRIEF) features:

  • FAST corner detection + orientation computation
  • rBRIEF descriptor (256-bit binary)
  • Image pyramid for scale invariance
  • Fast extraction; matching uses Hamming distance

2. Tracking

  • Constant velocity model for initial pose prediction
  • Feature matching + PnP solving
  • Local map tracking: match against local map points, optimize current pose
  • Keyframe selection strategy

3. Local Mapping

  • New keyframe insertion
  • Map point triangulation
  • Local Bundle Adjustment
\[ \min_{\{T_i\}, \{p_j\}} \sum_{i,j} \rho\left(\|z_{ij} - \pi(T_i, p_j)\|^2_{\Sigma_{ij}}\right) \]

where \(\pi\) is the projection function and \(\rho\) is a robust kernel function (e.g., Huber).

4. Loop Closure

  • Bag-of-words model (DBoW2) for image similarity retrieval
  • Geometric verification (RANSAC + Sim(3)/SE(3))
  • Pose graph optimization (correcting accumulated drift)
  • Optional global BA

3.3 Visual-Inertial SLAM (VIO)

Fusing IMU pre-integration:

\[ \Delta R_{ij} = \prod_{k=i}^{j-1} \text{Exp}((\omega_k - b_g) \Delta t) \]
\[ \Delta v_{ij} = \sum_{k=i}^{j-1} \Delta R_{ik} (a_k - b_a) \Delta t \]
\[ \Delta p_{ij} = \sum_{k=i}^{j-1} \Delta v_{ik} \Delta t + \frac{1}{2} \Delta R_{ik} (a_k - b_a) \Delta t^2 \]

VIO systems maintain robustness in texture-poor and fast-motion scenarios.


4 LiDAR SLAM

4.1 Overview

LiDAR SLAM uses laser scanners to obtain precise 3D point clouds (see Sensors), suitable for large-scale outdoor environments.

4.2 LOAM (Lidar Odometry and Mapping)

LOAM (Zhang & Singh, 2014) is one of the most influential LiDAR SLAM algorithms.

Core Idea: Decompose the problem into a high-frequency odometry module and a low-frequency mapping module.

Feature Extraction:

  • Edge points: Points with high curvature
  • Planar points: Points with low curvature

Point-to-Edge/Plane Distance Minimization:

Edge point residual:

\[ d_e = \frac{|(p_i - p_a) \times (p_i - p_b)|}{|p_a - p_b|} \]

Planar point residual:

\[ d_p = \frac{(p_i - p_a) \cdot ((p_a - p_b) \times (p_a - p_c))}{|(p_a - p_b) \times (p_a - p_c)|} \]

4.3 Cartographer

Google Cartographer (Hess et al., 2016) is a widely used 2D/3D SLAM system.

Front End:

  • Submap construction: Uses scan matching to insert new scans into submaps
  • Local optimization: Ceres Solver for scan matching

Back End:

  • Loop closure detection: Branch-and-bound search
  • Pose graph optimization: Sparse Pose Adjustment (SPA)

4.4 Other LiDAR SLAM Systems

System Features
LeGO-LOAM Ground segmentation + lightweight LOAM
LIO-SAM Tightly-coupled LiDAR-IMU, factor graph optimization
FAST-LIO2 Tightly-coupled, iKD-tree acceleration, efficient
CT-ICP Continuous-time ICP, handles motion distortion

5 Graph-Based SLAM

5.1 Pose Graph Optimization

Modeling SLAM as a graph optimization problem:

  • Nodes: Robot poses \(x_i\)
  • Edges: Relative constraints between poses \(z_{ij}\) (odometry, loop closure)

Optimization Objective:

\[ x^* = \arg\min_x \sum_{(i,j) \in \mathcal{E}} \|e_{ij}(x_i, x_j)\|^2_{\Omega_{ij}} \]

where the error function:

\[ e_{ij} = z_{ij}^{-1} \oplus (x_i^{-1} \oplus x_j) \]

\(\Omega_{ij}\) is the information matrix (inverse of covariance).

5.2 Factor Graphs

Factor graphs are a more general graphical model representation:

  • Variable nodes: States to be estimated (poses, landmarks, sensor biases, etc.)
  • Factor nodes: Constraints (priors, odometry, observations, loop closures)
\[ p(X) \propto \prod_k f_k(X_k) \]

Common Factor Types:

Factor Connected Variables Information Source
Prior Factor \(x_0\) Initial pose
Odometry Factor \(x_{t-1}, x_t\) Wheel/visual/inertial odometry
Landmark Observation Factor \(x_t, l_j\) Camera/LiDAR observation
Loop Closure Factor \(x_i, x_j\) Loop closure detection
IMU Pre-integration Factor \(x_i, x_j, v_i, v_j, b_i\) IMU data
GPS Factor \(x_t\) GPS localization

5.3 Solution Methods

The graph optimization problem ultimately reduces to nonlinear least squares:

\[ \min_x \sum_k \|r_k(x)\|^2_{\Sigma_k} \]

Solved iteratively using Gauss-Newton or Levenberg-Marquardt:

\[ (J^T \Sigma^{-1} J) \Delta x = -J^T \Sigma^{-1} r \]

Sparsity

The information matrix \(H = J^T \Sigma^{-1} J\) in SLAM problems is sparse (because only adjacent/co-visible variables have constraints between them). Efficient solving is achieved through sparse Cholesky decomposition.

Common Optimization Libraries:

  • g2o: General graph optimization framework
  • GTSAM: Factor graph-based, supports incremental optimization (iSAM2)
  • Ceres Solver: General nonlinear least squares solver

6 Learning-Enhanced SLAM

6.1 Depth Estimation

  • MonoDepth2: Self-supervised monocular depth estimation
  • DPT: Transformer-based depth prediction

6.2 Feature Extraction and Matching

  • SuperPoint: Self-supervised keypoint detection
  • SuperGlue: Graph Neural Network-based feature matching
  • LightGlue: Lightweight feature matching

6.3 End-to-End Learned SLAM

DROID-SLAM (Teed & Deng, 2021):

  • Dense matching based on RAFT optical flow network
  • Differentiable Bundle Adjustment layer
  • End-to-end training with strong generalization
\[ \min_{\{T_i\}, \{d_j\}} \sum_{(i,j)} \|p_{ij}^* - \Pi(T_i \cdot T_j^{-1}, d_j)\|^2_{\Sigma_{ij}} \]

where \(p_{ij}^*\) are network-predicted correspondences; gradient propagation is achieved through differentiable optimization by unrolling iterations.

6.4 Implicit Representations

  • iMAP: Uses NeRF as the map representation
  • NICE-SLAM: Hierarchical implicit representation
  • SplaTAM: 3D Gaussian Splatting-based SLAM

7 SLAM Evaluation Metrics

7.1 Trajectory Accuracy

Absolute Trajectory Error (ATE):

\[ \text{ATE} = \sqrt{\frac{1}{N}\sum_{i=1}^{N} \|p_i^{\text{est}} - p_i^{\text{gt}}\|^2} \]

Relative Pose Error (RPE):

\[ \text{RPE}_i = (T_i^{\text{gt}})^{-1} T_{i+\Delta}^{\text{gt}} \cdot ((T_i^{\text{est}})^{-1} T_{i+\Delta}^{\text{est}})^{-1} \]

7.2 Standard Datasets

Dataset Sensors Scenario
KITTI Stereo + LiDAR + GPS Outdoor driving
EuRoC Stereo + IMU Indoor drone
TUM-RGBD RGB-D Indoor handheld
Newer College Multi-LiDAR + Camera Outdoor walking

7.3 Evaluation Tools

  • evo: Python trajectory evaluation toolkit
  • rpg_trajectory_evaluation: Evaluation tool developed by ETH

8 System Integration

SLAM systems typically run under the ROS/ROS2 framework (see ROS2), coordinating with the navigation stack:

  • SLAM provides pose estimation and maps
  • The navigation stack performs path planning based on the map
  • Controllers execute motion commands

Sensor selection and calibration are crucial for SLAM performance (see Sensors).


References

  1. Thrun, S., Burgard, W. & Fox, D. (2005). Probabilistic Robotics. MIT Press.
  2. Cadena, C. et al. (2016). Past, present, and future of simultaneous localization and mapping. IEEE TRO.
  3. Campos, C. et al. (2021). ORB-SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM. IEEE TRO.
  4. Zhang, J. & Singh, S. (2014). LOAM: Lidar Odometry and Mapping in Real-time. RSS.
  5. Teed, Z. & Deng, J. (2021). DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. NeurIPS.
  6. Grisetti, G. et al. (2010). A tutorial on graph-based SLAM. IEEE ITSM.

评论 #