SLAM: Simultaneous Localization and Mapping
Overview
SLAM (Simultaneous Localization and Mapping) is the problem of simultaneously estimating a robot's own pose and constructing a map of the environment in an unknown setting. It is a core technology for autonomous navigation of mobile robots.
Probabilistic Formulation of the SLAM Problem:
where:
| Symbol | Meaning |
|---|---|
| \(x_{1:t}\) | Robot pose sequence from time 1 to \(t\) |
| \(m\) | Environment map |
| \(z_{1:t}\) | Sensor observation sequence |
| \(u_{1:t}\) | Control input (odometry) sequence |
graph LR
A["Sensor Data<br/>Camera/LiDAR/IMU"] --> B["Front End<br/>Data Association/Feature Extraction"]
B --> C["State Estimation<br/>Filtering/Optimization"]
C --> D["Back End<br/>Loop Closure/Global Optimization"]
D --> E["Output<br/>Pose + Map"]
D -->|"Loop Closure Constraints"| C
style B fill:#e8f4fd,stroke:#2196F3
style C fill:#fff3e0,stroke:#FF9800
style D fill:#f3e5f5,stroke:#9C27B0
1 Mathematical Foundations of SLAM
1.1 Motion Model
1.2 Observation Model
1.3 Two Forms of SLAM
| Form | Estimated Quantity | Methods |
|---|---|---|
| Online SLAM | \(p(x_t, m \mid z_{1:t}, u_{1:t})\) | EKF-SLAM, particle filter |
| Full SLAM | \(p(x_{1:t}, m \mid z_{1:t}, u_{1:t})\) | Graph optimization |
2 Filter-Based SLAM
2.1 EKF-SLAM
EKF-SLAM is the most classic SLAM method, maintaining a joint Gaussian distribution over the robot pose and all landmark positions.
State Vector:
Prediction Step (Motion Update):
Update Step (Observation Update):
where \(F_t = \frac{\partial f}{\partial x}\big|_{\hat{\mu}_{t-1}}\) and \(H_t = \frac{\partial h}{\partial x}\big|_{\bar{\mu}_t}\) are the Jacobian matrices.
Limitations of EKF-SLAM:
- Computational complexity \(O(N^2)\) (\(N\) = number of landmarks), unsuitable for large-scale environments
- Linearization errors lead to inconsistency
- Data association errors are difficult to recover from
2.2 Particle Filter SLAM (FastSLAM)
FastSLAM (Montemerlo et al., 2002) uses Rao-Blackwellized particle filtering to decompose the SLAM problem:
- Particles represent the path posterior \(p(x_{1:t} \mid \cdot)\)
- Each particle maintains an independent map (EKF update for each landmark)
Advantages:
- Computational complexity \(O(M \log N)\) (\(M\) = number of particles, \(N\) = number of landmarks)
- Naturally handles multimodal distributions
- Data association can be handled independently per particle
3 Visual SLAM
3.1 Overview
Visual SLAM uses cameras as the primary sensor, classified into:
- Feature-based: Extract feature points, estimate pose based on geometric constraints
- Direct: Directly use pixel intensity values to optimize pose
- Semi-direct: Combines advantages of both (e.g., SVO)
3.2 ORB-SLAM3 System
ORB-SLAM3 (Campos et al., 2021) is currently the most complete open-source visual SLAM system, supporting monocular, stereo, RGB-D, and visual-inertial modes.
System Pipeline:
graph TB
A["Image Input"] --> B["ORB Feature Extraction"]
B --> C["Feature Matching"]
C --> D["Tracking Thread<br/>(Frame-to-Frame Pose Estimation)"]
D --> E["Keyframe Decision"]
E -->|"Is Keyframe"| F["Local Mapping Thread<br/>(Local BA)"]
F --> G["Loop Closure Thread<br/>(DBoW2)"]
G -->|"Loop Detected"| H["Pose Graph Optimization"]
H --> I["Global BA"]
D --> J["Pose Output"]
F --> K["Map Point Cloud"]
style D fill:#e8f4fd,stroke:#2196F3
style F fill:#fff3e0,stroke:#FF9800
style G fill:#f3e5f5,stroke:#9C27B0
Core Modules:
1. Feature Extraction
Uses ORB (Oriented FAST and Rotated BRIEF) features:
- FAST corner detection + orientation computation
- rBRIEF descriptor (256-bit binary)
- Image pyramid for scale invariance
- Fast extraction; matching uses Hamming distance
2. Tracking
- Constant velocity model for initial pose prediction
- Feature matching + PnP solving
- Local map tracking: match against local map points, optimize current pose
- Keyframe selection strategy
3. Local Mapping
- New keyframe insertion
- Map point triangulation
- Local Bundle Adjustment
where \(\pi\) is the projection function and \(\rho\) is a robust kernel function (e.g., Huber).
4. Loop Closure
- Bag-of-words model (DBoW2) for image similarity retrieval
- Geometric verification (RANSAC + Sim(3)/SE(3))
- Pose graph optimization (correcting accumulated drift)
- Optional global BA
3.3 Visual-Inertial SLAM (VIO)
Fusing IMU pre-integration:
VIO systems maintain robustness in texture-poor and fast-motion scenarios.
4 LiDAR SLAM
4.1 Overview
LiDAR SLAM uses laser scanners to obtain precise 3D point clouds (see Sensors), suitable for large-scale outdoor environments.
4.2 LOAM (Lidar Odometry and Mapping)
LOAM (Zhang & Singh, 2014) is one of the most influential LiDAR SLAM algorithms.
Core Idea: Decompose the problem into a high-frequency odometry module and a low-frequency mapping module.
Feature Extraction:
- Edge points: Points with high curvature
- Planar points: Points with low curvature
Point-to-Edge/Plane Distance Minimization:
Edge point residual:
Planar point residual:
4.3 Cartographer
Google Cartographer (Hess et al., 2016) is a widely used 2D/3D SLAM system.
Front End:
- Submap construction: Uses scan matching to insert new scans into submaps
- Local optimization: Ceres Solver for scan matching
Back End:
- Loop closure detection: Branch-and-bound search
- Pose graph optimization: Sparse Pose Adjustment (SPA)
4.4 Other LiDAR SLAM Systems
| System | Features |
|---|---|
| LeGO-LOAM | Ground segmentation + lightweight LOAM |
| LIO-SAM | Tightly-coupled LiDAR-IMU, factor graph optimization |
| FAST-LIO2 | Tightly-coupled, iKD-tree acceleration, efficient |
| CT-ICP | Continuous-time ICP, handles motion distortion |
5 Graph-Based SLAM
5.1 Pose Graph Optimization
Modeling SLAM as a graph optimization problem:
- Nodes: Robot poses \(x_i\)
- Edges: Relative constraints between poses \(z_{ij}\) (odometry, loop closure)
Optimization Objective:
where the error function:
\(\Omega_{ij}\) is the information matrix (inverse of covariance).
5.2 Factor Graphs
Factor graphs are a more general graphical model representation:
- Variable nodes: States to be estimated (poses, landmarks, sensor biases, etc.)
- Factor nodes: Constraints (priors, odometry, observations, loop closures)
Common Factor Types:
| Factor | Connected Variables | Information Source |
|---|---|---|
| Prior Factor | \(x_0\) | Initial pose |
| Odometry Factor | \(x_{t-1}, x_t\) | Wheel/visual/inertial odometry |
| Landmark Observation Factor | \(x_t, l_j\) | Camera/LiDAR observation |
| Loop Closure Factor | \(x_i, x_j\) | Loop closure detection |
| IMU Pre-integration Factor | \(x_i, x_j, v_i, v_j, b_i\) | IMU data |
| GPS Factor | \(x_t\) | GPS localization |
5.3 Solution Methods
The graph optimization problem ultimately reduces to nonlinear least squares:
Solved iteratively using Gauss-Newton or Levenberg-Marquardt:
Sparsity
The information matrix \(H = J^T \Sigma^{-1} J\) in SLAM problems is sparse (because only adjacent/co-visible variables have constraints between them). Efficient solving is achieved through sparse Cholesky decomposition.
Common Optimization Libraries:
- g2o: General graph optimization framework
- GTSAM: Factor graph-based, supports incremental optimization (iSAM2)
- Ceres Solver: General nonlinear least squares solver
6 Learning-Enhanced SLAM
6.1 Depth Estimation
- MonoDepth2: Self-supervised monocular depth estimation
- DPT: Transformer-based depth prediction
6.2 Feature Extraction and Matching
- SuperPoint: Self-supervised keypoint detection
- SuperGlue: Graph Neural Network-based feature matching
- LightGlue: Lightweight feature matching
6.3 End-to-End Learned SLAM
DROID-SLAM (Teed & Deng, 2021):
- Dense matching based on RAFT optical flow network
- Differentiable Bundle Adjustment layer
- End-to-end training with strong generalization
where \(p_{ij}^*\) are network-predicted correspondences; gradient propagation is achieved through differentiable optimization by unrolling iterations.
6.4 Implicit Representations
- iMAP: Uses NeRF as the map representation
- NICE-SLAM: Hierarchical implicit representation
- SplaTAM: 3D Gaussian Splatting-based SLAM
7 SLAM Evaluation Metrics
7.1 Trajectory Accuracy
Absolute Trajectory Error (ATE):
Relative Pose Error (RPE):
7.2 Standard Datasets
| Dataset | Sensors | Scenario |
|---|---|---|
| KITTI | Stereo + LiDAR + GPS | Outdoor driving |
| EuRoC | Stereo + IMU | Indoor drone |
| TUM-RGBD | RGB-D | Indoor handheld |
| Newer College | Multi-LiDAR + Camera | Outdoor walking |
7.3 Evaluation Tools
- evo: Python trajectory evaluation toolkit
- rpg_trajectory_evaluation: Evaluation tool developed by ETH
8 System Integration
SLAM systems typically run under the ROS/ROS2 framework (see ROS2), coordinating with the navigation stack:
- SLAM provides pose estimation and maps
- The navigation stack performs path planning based on the map
- Controllers execute motion commands
Sensor selection and calibration are crucial for SLAM performance (see Sensors).
References
- Thrun, S., Burgard, W. & Fox, D. (2005). Probabilistic Robotics. MIT Press.
- Cadena, C. et al. (2016). Past, present, and future of simultaneous localization and mapping. IEEE TRO.
- Campos, C. et al. (2021). ORB-SLAM3: An accurate open-source library for visual, visual-inertial and multi-map SLAM. IEEE TRO.
- Zhang, J. & Singh, S. (2014). LOAM: Lidar Odometry and Mapping in Real-time. RSS.
- Teed, Z. & Deng, J. (2021). DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. NeurIPS.
- Grisetti, G. et al. (2010). A tutorial on graph-based SLAM. IEEE ITSM.