Visual-SLAM Developer Roadmap - 2026
Visual-SLAM is a special case of 'Simultaneous Localization and Mapping ', which you use a camera device to gather exteroceptive sensory data.
Below there is a set of topics you need to understand in Visual-SLAM, from an absolute beginner difficulty to getting ready to become a Visual-SLAM engineer / researcher.
Visual-SLAM is often portrayed as a rather difficult topic - many think good C++ programming skills and deep understanding of mathematics is necessary.
On the other hand, there are not many courses provided for beginners, especially in non-English languages.
I made these charts to share my thoughts and experience on studying Visual-SLAM, and hopefully the beginner learners can get a grasp of where to start from.
Purpose of these Roadmaps
The purpose of these roadmaps is to give you an idea about the general overview of Visual-SLAM, and to guide you if you are confused about where to start from.
Acknowledge that SLAM has a relatively high entry barrier - it's not because of the requirement of undertanding difficult mathematics, but the requirement of equipping yourself with various types of skills. Don't feel overwhelmed - you don't need to learn everything if you are just getting started. Instead, enjoy the journey itself and progress topic by topic. The result will be very rewarding.
Level
Topic
Focus
1
Beginner
Programming, Math, Projective Geometry, Camera, Image
2
Getting Familiar
Feature matching, MVG, Optimization, Factor Graph, Mapping, Sensors
3
Monocular SLAM
Feature/Direct/Hybrid/Learning-based, Foundation Model, Neural Representation, Semantic
4
RGB-D SLAM
KinectFusion, ElasticFusion, BundleFusion, DSP-SLAM
5
Deep Learning + SLAM
A. Frontend · B. Backend · C. Systems · D. Scene Understanding
6
VIO / VINS
Filter-based (MSCKF) vs Optimization-based (VINS-Mono, OKVIS2-X)
7
Stereo SLAM
S-PTAM, ORB-SLAM2/3 stereo, LDSO
8
Collaborative SLAM
CCM-SLAM, Kimera-Multi, Swarm-SLAM
9
LiDAR & Visual-LiDAR
LOAM, FAST-LIO2, LVI-SAM, R3LIVE, FAST-LIVO2
10
Event Camera SLAM
EVO, Ultimate-SLAM, DEVO
11
World Models & Spatial AI
GAIA-1, Cosmos, VLM/VLA, Generative 3D
C++ : Pointer, OOP
Python
Bash/Linux : Basic terminal usage
Basic Probability & Statistics : Gaussian distribution, Bayes' theorem
Basic Linear Algebra : Vectors & Matrices, Determinant, Dot & Cross product, Rank, Inverse matrix, Transpose matrix, SVD, Eigenvalues/Eigenvectors
Logarithm & Exponential
Basic Calculus : Differentiation, Taylor expansion
Pinhole camera model → Image projection
Camera calibration : Intrinsic/Extrinsic parameters, Lens distortion
Rigid body motion : Euler/Quaternion/Rotation Matrix, Projective space & Vanishing point, Homogeneous transformation
Epipolar geometry → Essential & Fundamental matrix
Triangulation
Lens, Sensor, Resolution/ISO/Aperture
Colour image, Resolution, Grayscale image
Thresholding, Gaussian blur
Corner detector : Harris corner
Edge detector : Sobel & Canny Edge
Stereovision, RGB-D, Disparity, Depth
Level 2: Getting Familiar with SLAM
C++ : OOP, Modern C++, Data structures & Algorithms, Compilers, CMake/Makefile/Ninja, Design patterns, OpenCV C++
C
Git/GitHub
OpenCV (opencv-python)
Python : Deep learning, Graph plots, System scripts
Bash/Linux : ssh, CLI text editor/Vim/tmux
Concurrency : SIMD-SSE/AVX/Neon, OpenMP, CUDA
Mobile : Android (Java/Kotlin), iOS (Objective-C/Swift)
Maths library : Eigen, Ceres-solver/GTSAM/g2o
C++/Python interop : PyBind11, nanobind
Docker
C# : COLMAP, Unity AR, Microsoft Hololens
CI/CD : GitHub Actions, Apache Airflow
ROS/ROS2
Simulation : Gazebo, Isaac Sim
Keypoints → Detector/Descriptor
SIFT, FAST, ORB, AKAZE
Deep features: R2D2, Superpoint
Image pyramid, oFAST, rBRIEF
Brute-Force, FLANN, Kd-Tree
LSH, Multi-probe LSH, HBST
Superglue
Bag of Visual Words, NetVLAD
Deep image retrieval, Hierarchical localization
Optical flow, KLT Tracker
2D-2D correspondence : Essential/Fundamental, Homography
2D-3D correspondence : P3P, PnP, SVD
3D-3D correspondence : ICP
RANSAC, PROSAC, M-Estimator, MAXCON, Convex relaxation
Least Squares Optimisation
Reprojection error, Bundle adjustment
Non-linear optimisation, Lie algebra
Lie groups : SO(3), SE(3)
Gauss-Newton, Levenberg-Marquardt
Pose graph optimization
Schur complement / Sparsity
Proprioceptive sensor : IMU, Wheel
Odometry (pose)
Exteroceptive sensor : Camera, LiDAR
Landmark (Map)
Joint optimisation, MLE & MAP
Factor Graph Optimisation
Point cloud, Occupancy grid mapping, TSDF, Surfel, Voxel map
Camera device : Wide/telecentric lens, Lens MTF, CCD/CMOS, Rolling/Global shutter, Exposure/ISO, Stereovision, RGB-D, Structured light, Active IR/ToF
LiDAR → Visual-LiDAR fusion
IMU → VIO
RADAR → Sensor fusion, Extended Kalman filter
Sonar
Multi-sensor calibration : Camera-IMU, Camera-LiDAR
Metrics : ATE (Absolute Trajectory Error), RPE (Relative Pose Error)
Datasets : KITTI, TUM RGB-D, EuRoC
Monocular SLAM · VIO/VINS · Stereo SLAM · Visual-LiDAR Fusion · RGB-D SLAM · Collaborative SLAM · Deep SLAM/Localization
Level 3: Monocular Visual-SLAM
VO vs SLAM — VO is local (no loop closure), SLAM includes global map + loop closure
Scale ambiguity — Fundamental limitation of monocular SLAM; absolute scale is unrecoverable from images alone
Covisibility graph — Shared map point visibility between keyframes; core data structure in ORB-SLAM
Visual Place Recognition (VPR) — Recognising previously visited places for loop closure
Self-supervised depth — Learning monocular depth without ground truth (Monodepth2, Godard 2019)
System
Author/Year
Key Concepts
Visual Odometry
Nister 2004
Fundamental matrix, Triangulation, VO (local-only, no loop closure)
MonoSLAM
Davison 2007
First real-time monocular SLAM , EKF-based, single camera, sparse 3D map, probabilistic feature initialization
PTAM
Klein & Murray 2007
FAST feature, Tracking, Frontend/Backend separation , Parallel threads, Keyframe, Mapping, Bundle adjustment, Manual initialisation
Visual-SLAM why filter?
Strasdat 2012
Bundle adjustment, Scale-aware BA, Motion-only BA
ORB-SLAM
Mur-Artal 2015
ORB keypoint, Automatic initialisation (Homography vs Fundamental selection) , Tracking thread, Sliding-window BA, Local mapping, Large-scale, Loop closure, Bag of visual words, Global optimisation, Covisibility graph, Map point management (culling, merging)
Pop-up SLAM
Yang 2016
Line/Plane features
PL-SLAM
Pumarola 2017
Point/Line features
ORB-SLAM2
Mur-Artal 2017
→ Stereo SLAM, → RGB-D SLAM
CubeSLAM
Yang 2019
Monocular 3D cuboid detection + SLAM, 9-DoF object representation
OpenVSLAM
Sumikura 2019
—
Stella-VSLAM
(fork) 2021
OpenVSLAM successor, license reboot
UcoSLAM
Munoz-Salinas 2019
Fiducial markers
DeepFusion
LaidLow 2019
—
ORB-SLAM3
Campos 2020
Monocular + Stereo + VIO, Multi-map, IMU integration
DXSLAM
Li 2020
Deep features for SLAM
PyCuVSLAM
NVIDIA 2026
Python + CUDA GPU-accelerated VSLAM toolkit (cuVSLAM wrapper)
System
Author/Year
Key Concepts
DTAM
Newcombe 2011
Dense mapping, Keyframe mapping, GPGPU
LSD-SLAM
Engel 2014
Photometric error minimisation, High gradient pixels/edges, Large scale, Loop closure, Pose graph optimisation
DSO
Engel 2016
Photometric bundle adjustment, Sliding window BA, No loop closure/global optimisation
LDSO
Gao 2018
DSO + Loop closure (BoW-based), addresses DSO's main weakness
CNN-SLAM
Tateno 2017
Depth from LSD-SLAM + deep depth, Semantic label
DVSO
Yang 2018
Deep single image depth estimation, StackNet
Basalt
Usenko 2020
Non-linear recovery (→ primarily VIO, see Level 6)
D3VO
Yang 2020
Deep single image depth estimation, Deep pose, Deep aleatoric uncertainty
Hybrid (Feature + Direct)
System
Author/Year
Key Concepts
SVO
Forster 2014
FAST feature detection, Direct-based feature tracking, Bundle adjustment
SVO2
Forster 2017
Multi-camera/Fisheye, Probabilistic depth estimation, Direct method convergence, Sparse method
Stereo DSO
Wang 2017
→ Stereo SLAM
VI-DSO
Gao 2018
→ VIO/VINS
System
Author/Year
Key Concepts
DROID-SLAM
Teed 2021
Differentiable BA, dense optical flow, end-to-end learned
TartanVO
Wang 2021
Generalizable visual odometry
DPV-SLAM / DPVO
Teed 2023
DROID-SLAM lightweight, patch-based visual odometry
MAC-VO
Qu 2024
Learning-based VO, metric-aware
VoT
Yugay 2025
Visual Odometry with Transformers
System
Author/Year
Key Concepts
DUSt3R
Wang 2024
Pointmap regression from image pairs, no calibration needed
MASt3R
Leroy 2024
DUSt3R + local feature matching
MASt3R-SLAM
Leroy 2024
Real-time dense SLAM from MASt3R
VGGT
Wang (Meta) 2025
Feed-forward inference of poses, depths, pointmaps, tracks from N views (CVPR 2025 Best Paper )
VGGT-SLAM
2025
VGGT as frontend for real-time SLAM
VGGT-SLAM 2.0
2026
Improved VGGT-SLAM
VGGT-Geo
2026
Probabilistic geometric fusion of VGGT priors for dense indoor SLAM
IGGT
Li 2026
VGGT + VLM — language-grounded 3D geometry
AMB3R
Wang 2025
MASt3R frontend + Transformer backend for SfM/SLAM
MASt3R-Fusion
WHU 2025
MASt3R-SLAM + IMU + GNSS fusion
System
Author/Year
Key Concepts
InstantSfM
2025
GPU-accelerated SfM pipeline, 40× faster than COLMAP
Neural Representation SLAM
System
Author/Year
Key Concepts
iMAP
Sucar 2021
First NeRF-SLAM, single MLP, real-time tracking/mapping
BARF
Lin 2021
Bundle-Adjusting NeRF, coarse-to-fine positional encoding, joint pose+NeRF opt (not full SLAM — pose+NeRF co-optimization)
NICE-SLAM
Zhu & Peng 2022
Hierarchical feature grid (coarse/mid/fine), scalable
Co-SLAM
Wang 2023
Hash grid (Instant-NGP) + coordinate encoding, 5-10× faster than NICE-SLAM
ESLAM
Johari 2023
Tri-plane representation, O(N²) vs O(N³) memory
Point-SLAM
Sandström 2023
Neural point cloud based
NeRF-SLAM
Rosinol 2023
NeRF + classical SLAM pipeline
NICER-SLAM
Zhu 2024
RGB-only NeRF-SLAM (no depth sensor), monocular depth integration
vMAP
Kong 2023
Object-level NeRF-SLAM, per-object neural fields
GO-SLAM
Zhang 2023
Global optimization + NeRF-SLAM, loop closure + global BA
System
Author/Year
Key Concepts
SplaTAM
Keetha 2024
First 3DGS-SLAM, RGB-D, silhouette-guided densification
MonoGS
Matsuki 2024
Monocular 3DGS-SLAM, depth network + triangulation fusion
GS-ICP SLAM
Yu 2024
Gaussian-to-Gaussian ICP (Mahalanobis distance), geometric tracking
Photo-SLAM
Huang 2024
Explicit geometry + implicit appearance (MLP color), anti-aliasing
RTG-SLAM
2024
Real-time focus, adaptive Gaussian budget, Jetson Orin 25 FPS
EGG-Fusion
ZJU 2025
Gaussian surfel fusion, information-filter-based, real-time 24 FPS
Online-Mono-3DGS (MODP)
2025
ORB-SLAM3 tracking + Hierarchical Gaussian Management
ActiveSplat
Li 2025
Active mapping with 3DGS + Voronoi-based path planning
Open-S3SLAM
2026
Open-set semantic 3DGS SLAM for smartphones (ICRA 2026)
LEGS
2025
Language Embedded Gaussian Splats, real-time language-queryable 3D
Semantic / Language-Grounded SLAM
System
Author/Year
Key Concepts
ConceptFusion
Jatavallabhula (MIT) 2023
CLIP features fused into 3D map, open-vocabulary language queries
LERF
Kerr 2023
Language Embedded Radiance Fields, DINO multi-scale, NeRF + CLIP
OpenScene
Peng (ETH) 2023
Language features back-projected to 3D point clouds
ConceptGraphs
Gu 2023
Open-vocabulary 3D Scene Graph, SAM + CLIP + LLM spatial relations
SpatialLLM
Mao 2025
Point cloud → LLM, structured indoor modeling as Python scripts
Also see: LEGS , Open-S3SLAM (3DGS-based section above); Open-YOLO 3D (Level 5 Object Detection)
Level 4: RGB-D Visual-SLAM
Intel RealSense D series
Microsoft Kinect v1/v2
Azure Kinect DK
Occipital Structure Core
Orbbec Astra
System
Author/Year
Key Concepts
ICP
Besl & McKay 1992
—
DTAM
Newcombe 2011
—
KinectFusion
Newcombe 2011
GPGPU, Tracking (project depth → 3D, surface normal, coarse-to-fine ICP), Mapping (volumetric integration, TSDF), Robust to small scene changes, Cannot model deformation, Map growth cubic, Room-size only
Double Window Optimisation
Strasdat 2011
—
Kintinuous
Whelan 2012
Volume shift, Geometric, Photometric, dBoW+SURF, Optimisation, Loop closure
RGBD-SLAM-V2
Endres 2013
Tracking (colour image, visual features, depth image, point cloud, transformation), Mapping (OctoMap 2013)
SLAM++
Salas-Moreno 2013
Object-oriented SLAM
DVO
Kerl 2013
Keyframe, Depth, Direct method, Optimisation, Loop closure
RTAB-Map
Labbé 2014
Loop closure, Map merge, Multi-session memory management
MRS-Map
Stuckler 2014
—
ElasticFusion
Whelan 2015
Active: frame-to-model tracking (photometric + geometric), joint optimisation, fused surfel-based model reconstruction · Inactive: local loop closure (model-to-model local surface, submodel separation), global loop closure (randomised fern encoding, non-rigid space deformation)
DynamicFusion
Newcombe 2015
6D motion field, Deformable scene
ORB-SLAM2
Mur-Artal 2016
Bundle adjustment, Sparse reconstruction
BundleFusion
Dai 2016
Local-to-global optimisation, Sparse RGB feature, Coarse global pose estimation, Fine pose refinement (geometric + photometric)
SemanticFusion
McCormac 2016
Deep Learning CNN, Deep Semantic SLAM
InfiniTAM v3
Prisacariu 2017
Tracking (scene raycast, depth image, RGB image), Relocalisation (random ferns), Mapping (TSDF reconstruction, voxel hashing, surfel reconstruction)
Fusion++
McCormac & Clark 2018
Deep Learning CNN, Mask-RCNN instance segmentation, Object-level SLAM, No prior, Object-level TSDF reconstruction
PointFusion / DenseFusion
Xu 2018 / Wang 2019
RGB-D object pose estimation, Tracking, Relocalisation, Loop closure detection
BAD SLAM
Schops 2019
Direct bundle adjustment, Deep Semantic SLAM
RTAB-Map v2
Labbé 2019
RGB-D/LiDAR, Light-source detection (2016)
MoreFusion
Wada & Sucar 2020
DL instance segmentation, Object-level volumetric fusion, Volumetric pose prediction, 3D scene reconstruction, Collision-based refinement, Semantic SLAM, Object pose estimation, CAD object fitting
NodeSLAM
Wada & Sucar 2020
Occupancy VAE, Object-level SLAM (→ also in Level 5 Latent Representation)
Kimera / 3D Dynamic Scene Graph
Rosinol 2020
Kimera-VIO, Kimera-Mesher, Kimera-PGMO, Kimera-Semantics, Kimera-DSG
DSP-SLAM
Wang (UCL) 2021
DeepSDF shape prior + ORB-SLAM2, object-level dense reconstruction (mono/stereo/LiDAR)
Level 5: Applying Deep Learning
Level 5 is organized into four pillars:
A. Frontend — learned perception components replacing hand-crafted modules
B. Backend — learned/certifiable optimization replacing classical solvers
C. Systems — end-to-end deep VO/SLAM pipelines
D. Scene Understanding — semantic, language, and relational reasoning on SLAM maps
A. Deep Frontend — Perception
Feature Detection & Matching
System
Author/Year
Key Concepts
NetVLAD
Arandjelovic 2016
VLAD, place recognition
SuperPoint
DeTone 2017
Homographic Adaptation, Self-supervised, VGG encoder + detector/descriptor heads
HardNet
Mishchuk 2017
Learned local descriptor
R2D2
Revaud 2019
Repeatable + Reliable detector/descriptor, explicit repeatability/reliability maps
KeyNet
Barroso-Laguna 2019
Learned keypoint detector
HF-Net
Sarlin 2019
Global feature, Local feature, Visual localization
SuperGlue
Sarlin 2020
Self/Cross-attention GNN, Sinkhorn optimal assignment, dustbin for outliers
DISK
Tyszkiewicz 2020
Policy gradient (RL) training, match success/failure as reward
Patch NetVLAD
Hausler 2021
Multi-scale patch-level VLAD
LoFTR
Sun 2021
Detector-free, Transformer coarse-to-fine dense matching
LightGlue
Lindenberger 2023
Adaptive depth/width, 5-10× faster than SuperGlue
XFeat
Potje 2024
0.3M params, 1400 FPS (RTX 4090), 64-dim descriptor, embedded-friendly
RoMA
Edstedt 2024
DINOv2 foundation feature + coarse-to-fine dense matching
DeDoDe
Edstedt 2024
Joint detect-and-describe in one stage
RoMA V2
Edstedt 2026
Improved RoMA
System
Author/Year
Key Concepts
MonoDepth
Godard 2016
Left-Right photometric consistency, self-supervised
MiDaS
Ranftl 2020
Multi-dataset mixing, scale-and-shift invariant loss, relative depth
DPT
Ranftl 2021
Dense Prediction Transformer (ViT backbone), global context
ZoeDepth
Bhat 2023
Zero-shot metric depth, Metric Bins Module
Metric3D
Yin 2023
Camera intrinsic-conditioned metric depth, Canonical Camera Space
Depth Anything
Yang 2024
62M images, foundation model for monocular depth
Depth Anything V2
Yang 2024
Improved with synthetic data, better edge preservation
Marigold
Ke 2024
Stable Diffusion for depth, fine detail, uncertainty via sampling
Align3r
Melou 2025
Video temporal consistency, DUSt3R-based, CVPR 2025 Highlight
Masked Depth Modeling (LingBot-Depth)
2026
Fixes RGB-D failures on glass/mirrors/metal
Optical Flow & Scene Flow
System
Author/Year
Key Concepts
FlowNet
Dosovitskiy 2015
First end-to-end deep optical flow (SimpleNet / CorrNet)
FlowNet 2.0
Ilg 2017
Stacked networks, classical-level accuracy
PWC-Net
Sun 2018
Pyramid-Warping-Cost volume, coarse-to-fine, 8.4M params
FlowNet3D
Liu 2019
Point cloud scene flow, PointNet++ based
RAFT
Teed 2020
All-Pairs Correlation + iterative ConvGRU update, ECCV Best Paper
RAFT-3D
Teed 2021
Scene flow (3D motion) from RAFT
FlowFormer
Huang 2022
Transformer on cost volume tokens, global context
SEA-RAFT
2024
Efficient RAFT variant for real-time
Camera Pose Regression & Relocalization
System
Author/Year
Key Concepts
PoseNet
Kendall 2015
CNN-based 6-DoF pose regression (APR), GoogLeNet backbone
DSAC
Brachmann 2017
Differentiable RANSAC, Scene Coordinate Regression (SCR)
DSAC++
Brachmann 2018
Self-supervision, RGB-D support
CNN Pose Regression Limitations
Sattler 2019
Pose regression ≈ image retrieval performance
LM-Reloc
von Stumberg 2020
Deep direct relocalization
DSAC *
Brachmann 2021
Improved learning stability
ACE
Brachmann 2023
Accelerated Coordinate Encoding, 5-min training per scene
ACE Zero
Brachmann 2024
Zero-shot SCR, no pre-built 3D map needed
ACE-G
Brachmann 2024
Generalizable SCR via cross-attention, new scenes without fine-tuning
ACE-SLAM
Tang 2024
Neural implicit real-time SLAM, network weights = map
hloc
Sarlin 2019+
Hierarchical Localization: coarse (NetVLAD) → fine (SuperGlue) pipeline
Object Detection & Segmentation for SLAM
System
Author/Year
Key Concepts
YOLO (v1→v11)
Redmon 2016→2024
Real-time object detection, Ultralytics ecosystem
DETR
Carion 2020
Transformer detection, anchor-free, no NMS
RT-DETR
Lv (Baidu) 2023
Real-time DETR, YOLO-speed + Transformer quality
SAM
Kirillov 2023
Segment Anything, prompt-based, Foundation Model
SAM 2
Meta 2024
Video segmentation, Memory Attention, temporal consistency
Grounding DINO
Liu 2023
Text-prompted detection → SAM pipeline (Grounded SAM)
Open-YOLO 3D
Benseddik 2025
2D open-vocab detection → 3D instance seg, 16× faster
B. Deep Backend — Optimization
Differentiable Bundle Adjustment
System
Author/Year
Key Concepts
BA-Net
Tang 2019
FPN + differentiable LM layer, end-to-end SfM (ICLR)
DROID-SLAM
Teed 2021
Dense optical flow + differentiable dense BA, all-pixels reprojection
DPVO
Teed 2023
Patch-based DROID-SLAM, 30+ FPS real-time
Theseus
Pineda (Meta) 2022
Differentiable nonlinear optimization library (PyTorch)
Lietorch
Teed 2021
Lie group operations for PyTorch (SE(3)/SO(3))
Certifiably Optimal Algorithms
System
Author/Year
Key Concepts
SE-Sync
Rosen 2019
Certifiable pose graph optimization via SDP + Riemannian opt
TEASER++
Yang 2020
Point cloud registration, 90%+ outlier robust, TLS + Max Clique (T-RO/RSS 2020)
GNC
Yang 2020
Graduated Non-Convexity, continuation from convex → robust cost
QUASAR
Yang 2022
Certifiable rotation averaging, SDP + robust cost
Gaussian Belief Propagation & Graph Processors
System
Author/Year
Key Concepts
FutureMapping 1
Davison 2018
Computational structure of Spatial AI, GBP for SLAM
FutureMapping 2
Ortiz 2019
GBP as core Spatial AI primitive, visual intro to GBP
BA on Graph Processor
Ortiz 2020
Bundle Adjustment on Graphcore IPU, tile-based parallelism
DANCeRS
2023
GBP-based distributed consensus in robot swarms
C. End-to-End Deep VO / SLAM Systems
Self-supervised & Learned VO
System
Author/Year
Key Concepts
DeepVO
Wang 2017
Supervised learning
SfM-Learner
Zhou 2017
Unsupervised, deep depth + deep pose
DeMoN
Ummenhofer 2017
Depth + Motion from two frames, encoder-decoder
UndeepVO
Li 2018
Stereo self-supervised, absolute scale recovery
DeepTAM
Zhou 2018
Deep tracking and mapping, cost volume based
DeepV2D
Teed 2018
Iterative depth from video, differentiable geometry layers
Depth from Video in the Wild
Gordon 2019
Unconstrained video depth, learned camera intrinsics
Neural Ray Surfaces
Vasiljevic 2020
Learned ray surface model, non-pinhole cameras
GradSLAM
Murthy 2020
Differentiable SLAM framework (PyTorch, supports multiple SLAM backends)
DeepSLAM
Wang 2020
TrackingNet, MappingNet, LoopNet
MonoRec
Wimbauer 2021
Self-supervised monocular 3D reconstruction, moving objects
TANDEM
Koestler 2021
Real-time tracking + dense mapping via MVS depth, DSO-based
DROID-SLAM
Teed 2021
Dense BA + correlation, SOTA on TartanAir/EuRoC (→ see Differentiable BA)
DPVO
Teed 2023
Patch-based lightweight DROID (→ see Differentiable BA)
Latent Representation SLAM
System
Author/Year
Key Concepts
CodeSLAM
Bloesch 2018
Depth as 128-dim latent code, photometric BA on codes + poses
SceneCode
Zhi 2019
Depth + semantic in single latent code, cross-modal constraints
DeepFactors
Czarnowski 2020
Probabilistic depth codes + factor graph, GPU 30+ FPS
NodeSLAM
Sucar 2020
Object-level DeepSDF codes, occupancy VAE per object
CodeMapping
Shao 2021
Sparse SLAM + learned dense mapping, hybrid approach
Neural Rendering (reference)
NeRF/3DGS-based SLAM systems → see Level 3: Neural Representation SLAM
System
Author/Year
Key Concepts
NeRF
Mildenhall 2020
Neural Radiance Fields, novel view synthesis (foundational)
DIFIX3D+
2026
Single-step diffusion for 3D reconstruction artifact removal (post-processing)
System
Author/Year
Key Concepts
EFM3D
Straub (Meta) 2024
Egocentric Foundation Model 3D benchmark, depth/surface/semantic from ego-video
System
Author/Year
Key Concepts
Hydra
Hughes (MIT SPARK) 2022
Real-time hierarchical Scene Graph (mesh→objects→places→rooms→buildings)
Hydra-Multi
Hughes 2023
Distributed multi-robot 3D Scene Graph
Clio
Maggio (MIT SPARK) 2024
Open-set task-driven Scene Graph, CLIP embeddings per node
Khronos
Schmid (MIT SPARK) 2024
Spatio-temporal Scene Graph, dynamic object history tracking
ConceptGraphs
Gu 2023
Open-vocabulary 3D Scene Graph, SAM + CLIP + LLM relations (→ also in L3 Semantic)
Tightly-coupled vs Loosely-coupled — Joint vs separate optimization of visual and inertial measurements
Filter-based vs Optimization-based — EKF approaches vs nonlinear optimization (BA)
IMU preintegration — On-manifold IMU integration between keyframes (Forster 2015)
IMU noise model — Bias, random walk, Allan variance
Observability — Yaw and global position are unobservable in VIO
Resource
Author/Year
Key Concepts
Introduction to Inertial Navigation
Woodman 2007
IMU fundamentals, coordinate frames, error sources — essential prerequisite
IMU Preintegration on Manifold
Forster 2015
On-manifold preintegration, bias correction without re-integration
Quaternion kinematics for error-state KF
Sola 2017
Quaternion math, error-state formulation
System
Author/Year
Key Concepts
MSCKF
Mourikis 2007
Multi-State Constraint KF, efficient VIO without landmarks in state
ROVIO
Bloesch 2015
Robocentric VIO, direct photometric tracking + EKF
OpenVINS
Geneva 2020
Open-source MSCKF, modular, extensible
System
Author/Year
Key Concepts
OKVIS
Leutenegger 2015
Keyframe-based, tightly-coupled, sliding window optimization
VINS-Mono
Qin 2018
Tightly-coupled, relocalization, loop closure, pose graph optimization
VINS-Fusion
Qin 2019
Stereo + GPS fusion extension
MAPLAB
Schneider 2018
Multi-session visual-inertial mapping framework
Kimera-VIO
Rosinol 2020
Fast VIO frontend for Kimera pipeline, structureless vision factors
Basalt
Usenko 2020
Non-linear recovery, visual-inertial odometry + mapping
ORB-SLAM3
Campos 2020
VIO mode, multi-map, IMU initialization
DM-VIO
von Stumberg 2022
Deep monocular VIO, delayed marginalization
OKVIS2
Leutenegger 2022
Multi-session, improved marginalization
AirVO
Xu 2023
Point-line VIO, illumination-robust
OKVIS2-X
Boche & Leutenegger 2025
Multi-sensor SLAM (Visual+Inertial+Depth+LiDAR+GNSS), dense volumetric occupancy maps, submapping for large-scale (9km+), EuRoC/Hilti22 SOTA
Stereo rectification — Epipolar alignment for efficient disparity search
Disparity vs Depth — d = f·B/Z, baseline determines depth range/accuracy
Scale observability — Stereo provides metric scale (unlike monocular)
System
Author/Year
Key Concepts
S-PTAM
Pire 2017
Stereo PTAM, ROS-compatible, real-time
ORB-SLAM2 (stereo)
Mur-Artal 2016
Stereo + RGB-D modes, loop closure, relocalization
StereoMSCKF
Sun 2018
MSCKF with stereo, efficient for resource-constrained platforms
RTAB-Map
Labbé 2019
Multi-sensor (stereo/RGB-D/LiDAR), memory management, large-scale
ORB-SLAM3 (stereo)
Campos 2020
Multi-map, Atlas, stereo + IMU
Stella-VSLAM
Community 2022
Open-source fork of OpenVSLAM, stereo support
LDSO
Gao 2018
Direct stereo odometry with loop closure (DSO extension)
Level 8: Collaborative / Multi-Robot SLAM
Centralized vs Decentralized — Single server vs peer-to-peer map merging
Inter-robot loop closure — Place recognition across robots with different viewpoints
Communication constraints — Bandwidth-limited map sharing, sparse descriptors
Map merging — Aligning submaps from different robots into a global map
System
Author/Year
Key Concepts
C2TAM
Riazuelo 2014
Cloud-based collaborative monocular SLAM
CCM-SLAM
Schmuck & Chli 2019
Centralized collaborative monocular SLAM, robust to comm failures
DOOR-SLAM
Lajoie 2020
Distributed, outlier-resilient SLAM with pairwise consistency
Kimera-Multi
Tian 2022
Distributed multi-robot metric-semantic SLAM, mesh reconstruction
Swarm-SLAM
Lajoie 2024
Decentralized, sparse, scalable C-SLAM, supports LiDAR/stereo/RGB-D
CoPeD-Advancing
Stathoulopoulos 2024
Multi-robot collaborative perception for autonomous exploration
MAPLAB 2.0
Cramariuc 2023
Multi-session, multi-robot visual-inertial mapping
Level 9: LiDAR & Visual-LiDAR Fusion SLAM
LiDAR-Visual-Inertial (LVI) — Triple fusion for robust outdoor SLAM
Tightly-coupled LiDAR-camera — Joint optimization of point cloud and visual features
Direct LiDAR-camera alignment — Photometric/geometric alignment without feature extraction
Degradation handling — Graceful fallback when one modality fails (e.g., LiDAR in rain, camera in darkness)
Range image — 2D projection of LiDAR scans for efficient processing (SuMa, RangeNet++)
LiDAR / LiDAR-Inertial SLAM
System
Author/Year
Key Concepts
LOAM
Zhang 2014
LiDAR odometry and mapping (foundational), edge + planar features
SuMa
Behley (Bonn) 2018
Surfel-based LiDAR SLAM, projective ICP on range images
SuMa++
Chen (Bonn) 2019
SuMa + RangeNet++ semantics, semantic ICP weighting, dynamic object filtering
LIO-SAM
Shan 2020
Tightly-coupled LiDAR-inertial, factor graph, GPS fusion
FAST-LIO2
Xu 2022
Direct LiDAR-inertial, ikd-Tree, extremely fast
PIN-SLAM
Pan (Bonn) 2024
Neural point cloud LiDAR SLAM, point-to-SDF registration, elastic map deformation for loop closure
System
Author/Year
Key Concepts
LVI-SAM
Shan 2021
LiDAR-Visual-Inertial via factor graph, LIO-SAM + VINS-Mono
R3LIVE
Lin 2022
Real-time LiDAR-Visual-Inertial, dense RGB point cloud map
R3LIVE++
Lin 2023
Improved R3LIVE with mesh reconstruction
FAST-LIVO
Zheng 2022
FAST-LIO + direct visual odometry, tightly-coupled LVI
FAST-LIVO2
Zheng 2024
Improved, sequential image processing, direct photometric fusion
OKVIS2-X
Boche 2025
Visual+Inertial+Depth+LiDAR+GNSS configurable (also in Level 6)
Resource
Key Concepts
LiDAR-Visual-Inertial Survey (Zheng 2024)
Comprehensive survey of LVI SLAM systems
Level 10: Event Camera SLAM
Event cameras (DVS) — Asynchronous per-pixel brightness change detection, μs temporal resolution
Advantages — HDR (140dB+), no motion blur, low latency, low power
Challenges — No absolute intensity, sparse asynchronous output, requires new algorithms
Event representations — Event frames, time surfaces, voxel grids, spike tensors
Resource
Author/Year
Key Concepts
Event-based Vision Survey
Gallego 2020
Comprehensive survey of event camera algorithms
Awesome-Event-based-SLAM
KwanWaiPang
Curated GitHub list of event-based SLAM papers
System
Author/Year
Key Concepts
EVO
Rebecq 2017
Event-based Visual Odometry, 3D reconstruction from events
ESVO
Zhou 2021
Event-based Stereo Visual Odometry
Ultimate-SLAM
Vidal 2018
Events + frames + IMU fusion
EKLT
Gehrig 2020
Event-based KLT feature tracking
ESVIO
Chen 2023
Event-based Stereo VIO
EDS
Hidalgo-Carrió 2022
Event-aided direct sparse odometry
DEVO
Pellerito 2024
Deep event-based visual odometry (DROID-SLAM style)
VIO-GO
2025
Event-based VIO with optimized parameters for HDR scenarios
Level 11: World Models & Spatial AI
System
Author/Year
Key Concepts
GAIA-1
Wayve 2023
Driving World Model, action-conditioned future scene generation
Sora / DiT
OpenAI 2024
Diffusion Transformer, spacetime patches, emergent 3D understanding
NVIDIA Cosmos
NVIDIA 2026
World Foundation Model platform for Physical AI, synthetic data for AV/robots
World Labs / Marble
Fei-Fei Li 2026
3D world generation from images/video/text ($1B funding)
WorldVLA
Alibaba 2025
Autoregressive action world model, learns physics for action generation
SceneDINO
2025
Feed-forward unsupervised semantic scene completion
System
Author/Year
Key Concepts
DreamFusion
Poole 2023
Text-to-3D via Score Distillation Sampling (SDS) + NeRF
Vision-Language Models (VLM)
System
Author/Year
Key Concepts
CLIP
Radford (OpenAI) 2021
Contrastive image-text pretraining, 400M pairs, zero-shot
SigLIP
Zhai (Google) 2023
Sigmoid loss CLIP, more efficient, better at small model sizes
BLIP-2
Li (Salesforce) 2023
Q-Former bridges frozen LLM + image encoder
LLaVA
Liu 2023
LLaMA + vision, conversational VLM
Vision-Language-Action Models (VLA)
System
Author/Year
Key Concepts
RT-2
Brohan (DeepMind) 2023
Robot actions as text tokens, emergent generalization
OpenVLA
Kim 2024
Open-source VLA, SigLIP + Llama 7B + Action Head
Navila
2024
Navigation-specialized VLA, SLAM integration for localization
Resource
Key Concepts
Awesome-Transformer-based-SLAM
Curated GitHub list of Transformer-based SLAM methods
Book
Author
Key Topics
Introduction to Visual SLAM
Xiang Gao et al.
VO, optimization, Lie algebra, backend, loop closure — best entry-level SLAM book
Photogrammetric Computer Vision
Wolfgang Förstner & Bernhard Wrobel
Camera geometry, estimation, 3D reconstruction — mathematically rigorous
Multiple View Geometry in Computer Vision
Richard Hartley & Andrew Zisserman
Epipolar geometry, trifocal tensor, reconstruction — THE bible
Computer Vision: Algorithms and Applications
Richard Szeliski
Feature detection, stereo, motion, 3D — comprehensive reference (2nd ed. free PDF)
Resource
Link
changh95/slam_lecture_codes
GitHub — Hands-on SLAM lecture code collection
If you think any of the roadmaps can be improved, please do open a PR with any updates and submit any issues. Also, I will continue to improve this, so you might want to watch/star this repository to revisit.
Also, check out my GitHub and blog 😺
Open pull request with improvements
Discuss ideas in issues
Spread the word
Reach out to me directly at hyunggi.chang95[at]gmail.com.
To discuss any topics or ask questions, please use the issue tab .
The class is licensed under the MIT License :
Copyright © 2026 Hyunggi Chang .
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.