Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposed DM-VIO system is a monocular visual-inertial odometry system
- Uses two novel techniques called delayed marginalization and pose graph bundle adjustment
- Photometric bundle adjustment with dynamic weight for visual residuals
- Delayed marginalization allows for injection of IMU information into already marginalized states
- IMU initialization captures full photometric uncertainty and improves scale estimation
- System evaluated on EuRoC, TUM-VI, and 4Seasons datasets
- Outperforms stereo-inertial methods while using only a single camera and IMU
Paper Content
I. introduction
- Visual-inertial odometry is used in robotics, autonomous driving, and augmented reality.
- Combining cameras and inertial measurement units (IMUs) is a popular and accurate choice.
- IMU initialization can take a long time and can worsen performance if initialized prematurely.
- Stereo-inertial methods have outperformed mono-inertial ones in the past.
- This paper proposes a novel method for monocular visual-inertial odometry.
- Delayed marginalization is proposed to address three questions.
- The system is evaluated on three challenging datasets.
- The system exceeds the state of the art in visual-inertial odometry.
Ii. related work
- Most visual odometry and SLAM systems are feature-based
- Direct methods have been proposed to optimize a photometric error function
- Mourikis and Roumeliotis showed that visual and inertial measurements can increase accuracy and robustness
- Many tightly-coupled visual-inertial odometry and SLAM systems have been proposed
- Initialization of monocular visual-inertial systems is not trivial
- Most systems start with a visual-only system and use its output for a separate IMU initialization
- VI-DSO initializes immediately with an arbitrary scale and explicitly optimizes the scale in the main system
Iii. method
A. notation
- Denote vectors as bold lowercase letters
- Denote matrices as bold uppercase letters
- Denote scalars as lowercase letters
- Denote functions as uppercase letters
- Represent transformation from camera to world in visual coordinate frame
- Represent poses in visual frame or inertial frame
- Define subtraction operator for states
B. direct visual-inertial bundle adjustment
- DM-VIO uses a combined energy function to optimize visual and IMU variables.
- Visual part is based on DSO which is accurate and robust.
- IMU data is integrated into bundle adjustment using preintegration.
- Optimized variables include scale, gravity direction, poses in visual and IMU frames, velocity, bias, affine brightness parameters, and inverse depths of active points.
- Photometric weight is dynamically adjusted based on root mean squared photometric error.
- IMU error is calculated using preintegration.
- Old variables are marginalized using Schur complement.
- Maximum of 8 keyframes are kept during bundle adjustment.
D. delayed marginalization
- Captures full probability distribution
- Solving smaller system is equivalent to solving larger original system
- Reverting marginalization not possible without redoing whole procedure
- FEJ needed to keep marginalization prior consistent
- Delayed marginalization circumvents drawbacks of marginalization while retaining advantages
- Delayed marginalization enables capturing full photometric probability distribution
- Delayed marginalization allows updating marginalization prior with IMU information
- Delayed marginalization allows relinearizing variables in Markov blanket while keeping visual and inertial information
E. pose graph bundle adjustment for imu initialization
- PGBA utilizes delayed marginalization for IMU initialization
- Graph is populated with IMU factors and optimized
- At most Nf-2 poses without IMU variables
- Nf = 8 and delay d = 100
- Optimized with GTSAM library and Levenberg-Marquardt optimizer
- Combination of regular pose graph optimization and bundle adjustment
- Readvancing captures all visual and inertial information
F. robust multi-stage imu initialization
- Initialization strategy based on 3 insights
- Optimize unknown variables first, capture full covariance when optimizing all variables
- Connected variables must be close to optimum for marginalization prior to be consistent
- Coarse IMU Initialization: optimize velocities, bias, gravity direction, scale
- PGBA IMU Init.: optimize, threshold on marginal covariance for scale
- Marginalization Replacement: monitor scale change, rebuild PGBA graph, readvance to update marginalization prior
- Delayed marginalization used to update FEJ values, overcome main problem of marginalization
Iv. results
- Evaluated method on 3 datasets
- Supplementary video available
- Ablation studies and runtime evaluations in supplementary
- Experiments performed in realtime mode on MacBook Pro 2013
- Results for ORB-SLAM3 on slightly stronger desktop
- Evaluated 10 times for EuRoC and 5 times for other datasets
- Results presented in cumulative error plots
- RMSE and drift reported
- Tables to compare to other papers
- Median result for each sequence reported
A. euroc dataset
- EuRoC dataset is the most popular visual-inertial dataset
- Our method outperforms all other methods in terms of RMSE
- Lowest average scale error reported on the dataset
- Tracking takes 10.34ms on average
- Keyframe processing takes 53.67ms
- Delayed marginalization adds 0.44ms overhead
B. tum-vi dataset
- TUM-VI dataset is a challenging handheld dataset
- Our method outperforms other monocular and stereo methods
- ORB-SLAM3 has an advantage due to its loop closure system
- Our method is more robust overall
C. 4seasons dataset
- 4Seasons dataset is a recent automotive dataset with a well time-synchronized visual-inertial sensor
- Bottom 96 pixels of images are cropped off
- IMU noise parameters are determined the same way for all methods
- Noise models are inflated by 1, 10, 100, 1000 to determine best setting
- Visual initializer is modified for VI-DSO and DM-VIO
- Automotive scenario is challenging for monocular methods
- DM-VIO outperforms stereo-inertial ORB-SLAM3 and Basalt using monocular images and no loop closures
V. conclusion and future work
- We present a monocular visual-inertial odometry system that outperforms the state of the art, even stereo-inertial methods.
- Our system works well in flying, handheld, and automotive scenarios.
- The foundation of our IMU initialization is delayed marginalization, which also enables the pose graph bundle adjustment.
- We anticipate that this method will spark further research in this direction.
- The idea of delayed marginalization could be applied to more use cases.
- The pose graph bundle adjustment can also be applied to long-term loop closures.
- Our open-source system is easily extendible.
- We provide an ablation study on different parts of the IMU initializer.
- We provide an ablation study on the impact of the dynamic photometric weight.
- We perform extensive runtime analysis on a Mac-Book Pro 2013.
- The only regular overhead is the delayed marginalization, taking 0.44ms.
- We show the mean over all 110 runs on the EuRoC dataset.
- The dynamic weight provides a noticeable improvement in robustness.
- DM-VIO outperforms even state-of-the-art stereo-inertial methods by a large margin.