Deep Learning based Visual Odometry and SLAM Open Project
- DPVO (Deep Patch Visual Odometry) based visual odometry
- Multi-dataset support: TartanAir, Redwood (custom datasets)
- FP16/AMP Training: Automatic Mixed Precision for faster training with lower memory usage
- RTX 50 Series Support: Optimized CUDA kernels for Blackwell architecture (sm_120)
- Loop Closure: Optional DBoW2-based loop closure for large-scale SLAM
- GPU: NVIDIA RTX 20/30/40/50 series
- CUDA: 12.8+ (for RTX 50 series)
- Python: 3.12
- PyTorch: 2.9.1+
See INSTALL.md for detailed installation instructions.
# Create conda environment
conda create -n visual_slam python=3.12
conda activate visual_slam
# Install PyTorch
pip install torch torchvision
# Install dependencies
pip install tensorboard numba tqdm einops pypose kornia numpy plyfile evo opencv-python yacs
# Build CUDA extensions
export TORCH_CUDA_ARCH_LIST="12.0" # Adjust for your GPU
cd methods/dpvo
pip install --no-build-isolation .TartanAir:
# Download TartanAir dataset and structure as:
# datasets/TartanAir/{scene}/Easy/{P00X}/Redwood (Custom):
# Build pose pickle for Redwood dataset
python methods/dpvo/scripts/build_redwood_pickle.py \
--datapath datasets/redwood \
--mode train \
--output datasets/redwood/cache/Redwood_train.pickle# Train on TartanAir
python methods/dpvo/train.py --config methods/dpvo/config/tartan_train.yaml
# Fine-tune on Redwood
python methods/dpvo/train.py --config methods/dpvo/config/redwood_train.yaml# Run demo with visualization
python methods/dpvo/demo.py \
--imagedir=<path_to_images> \
--calib=<path_to_calibration> \
--stride=1 --viz
# Evaluate on TartanAir
python methods/dpvo/evaluate_tartan.py \
--weights=checkpoints/dpvo.pth \
--datapath=datasets/TartanAirTraining is configured via YAML files. Key options:
training:
name: experiment_name
steps: 240000
lr: 0.0008
amp: true # Enable FP16 training
dataloader:
batch_size: 1
num_workers: 8 # Parallel data loadingFP16 training is enabled by setting amp: true in the config. This provides:
- ~30% faster training
- ~40% less GPU memory usage
- Maintained numerical accuracy (< 1% relative error)
FP16 Compatibility:
cuda_corr: Feature maps (fmap1, fmap2) use FP16, coordinates use FP32lietorch: All Lie group operations internally use FP32 for numerical stabilitykabsch_umeyama: SVD operations use FP32 (autocast disabled)
See INSTALL.md for implementation details.
Deep-Visual-Odometry-SLAM/
├── methods/
│ └── dpvo/ # Main DPVO implementation
│ ├── dpvo/ # Core modules
│ │ ├── altcorr/ # CUDA correlation kernels (FP16 supported)
│ │ ├── fastba/ # CUDA bundle adjustment
│ │ ├── lietorch/ # Lie group operations (FP32 internally)
│ │ └── data_readers/ # Dataset loaders
│ ├── config/ # Training configurations
│ ├── train.py # Training script (AMP support)
│ └── INSTALL.md # Installation guide
├── modules/
│ ├── eigen-3.4.0/ # Eigen3 library
│ ├── DBoW2/ # Loop closure vocabulary
│ ├── DPRetrieval/ # Place recognition
│ ├── DPViewer/ # 3D visualization
│ └── Pangolin/ # OpenGL viewer
├── datasets/ # Dataset storage
└── checkpoints/ # Model checkpoints
| GPU Series | Architecture | Compute Capability | NVCC Flag |
|---|---|---|---|
| RTX 50xx | Blackwell | 12.0 | sm_120 |
| RTX 40xx | Ada Lovelace | 8.9 | sm_89 |
| RTX 30xx | Ampere | 8.6 | sm_86 |
| RTX 20xx | Turing | 7.5 | sm_75 |
This project is based on:
- DPVO - Deep Patch Visual Odometry
- TartanAir - Challenging Visual SLAM Dataset
- lietorch - Lie Groups for PyTorch
MIT License