Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Proposed DM-VIO system is a monocular visual-inertial odometry system
  • Uses two novel techniques called delayed marginalization and pose graph bundle adjustment
  • Photometric bundle adjustment with dynamic weight for visual residuals
  • Delayed marginalization allows for injection of IMU information into already marginalized states
  • IMU initialization captures full photometric uncertainty and improves scale estimation
  • System evaluated on EuRoC, TUM-VI, and 4Seasons datasets
  • Outperforms stereo-inertial methods while using only a single camera and IMU

Paper Content

I. introduction

  • Visual-inertial odometry is used in robotics, autonomous driving, and augmented reality.
  • Combining cameras and inertial measurement units (IMUs) is a popular and accurate choice.
  • IMU initialization can take a long time and can worsen performance if initialized prematurely.
  • Stereo-inertial methods have outperformed mono-inertial ones in the past.
  • This paper proposes a novel method for monocular visual-inertial odometry.
  • Delayed marginalization is proposed to address three questions.
  • The system is evaluated on three challenging datasets.
  • The system exceeds the state of the art in visual-inertial odometry.
  • Most visual odometry and SLAM systems are feature-based
  • Direct methods have been proposed to optimize a photometric error function
  • Mourikis and Roumeliotis showed that visual and inertial measurements can increase accuracy and robustness
  • Many tightly-coupled visual-inertial odometry and SLAM systems have been proposed
  • Initialization of monocular visual-inertial systems is not trivial
  • Most systems start with a visual-only system and use its output for a separate IMU initialization
  • VI-DSO initializes immediately with an arbitrary scale and explicitly optimizes the scale in the main system

Iii. method

A. notation

  • Denote vectors as bold lowercase letters
  • Denote matrices as bold uppercase letters
  • Denote scalars as lowercase letters
  • Denote functions as uppercase letters
  • Represent transformation from camera to world in visual coordinate frame
  • Represent poses in visual frame or inertial frame
  • Define subtraction operator for states

B. direct visual-inertial bundle adjustment

  • DM-VIO uses a combined energy function to optimize visual and IMU variables.
  • Visual part is based on DSO which is accurate and robust.
  • IMU data is integrated into bundle adjustment using preintegration.
  • Optimized variables include scale, gravity direction, poses in visual and IMU frames, velocity, bias, affine brightness parameters, and inverse depths of active points.
  • Photometric weight is dynamically adjusted based on root mean squared photometric error.
  • IMU error is calculated using preintegration.
  • Old variables are marginalized using Schur complement.
  • Maximum of 8 keyframes are kept during bundle adjustment.

D. delayed marginalization

  • Captures full probability distribution
  • Solving smaller system is equivalent to solving larger original system
  • Reverting marginalization not possible without redoing whole procedure
  • FEJ needed to keep marginalization prior consistent
  • Delayed marginalization circumvents drawbacks of marginalization while retaining advantages
  • Delayed marginalization enables capturing full photometric probability distribution
  • Delayed marginalization allows updating marginalization prior with IMU information
  • Delayed marginalization allows relinearizing variables in Markov blanket while keeping visual and inertial information

E. pose graph bundle adjustment for imu initialization

  • PGBA utilizes delayed marginalization for IMU initialization
  • Graph is populated with IMU factors and optimized
  • At most Nf-2 poses without IMU variables
  • Nf = 8 and delay d = 100
  • Optimized with GTSAM library and Levenberg-Marquardt optimizer
  • Combination of regular pose graph optimization and bundle adjustment
  • Readvancing captures all visual and inertial information

F. robust multi-stage imu initialization

  • Initialization strategy based on 3 insights
  • Optimize unknown variables first, capture full covariance when optimizing all variables
  • Connected variables must be close to optimum for marginalization prior to be consistent
  • Coarse IMU Initialization: optimize velocities, bias, gravity direction, scale
  • PGBA IMU Init.: optimize, threshold on marginal covariance for scale
  • Marginalization Replacement: monitor scale change, rebuild PGBA graph, readvance to update marginalization prior
  • Delayed marginalization used to update FEJ values, overcome main problem of marginalization

Iv. results

  • Evaluated method on 3 datasets
  • Supplementary video available
  • Ablation studies and runtime evaluations in supplementary
  • Experiments performed in realtime mode on MacBook Pro 2013
  • Results for ORB-SLAM3 on slightly stronger desktop
  • Evaluated 10 times for EuRoC and 5 times for other datasets
  • Results presented in cumulative error plots
  • RMSE and drift reported
  • Tables to compare to other papers
  • Median result for each sequence reported

A. euroc dataset

  • EuRoC dataset is the most popular visual-inertial dataset
  • Our method outperforms all other methods in terms of RMSE
  • Lowest average scale error reported on the dataset
  • Tracking takes 10.34ms on average
  • Keyframe processing takes 53.67ms
  • Delayed marginalization adds 0.44ms overhead

B. tum-vi dataset

  • TUM-VI dataset is a challenging handheld dataset
  • Our method outperforms other monocular and stereo methods
  • ORB-SLAM3 has an advantage due to its loop closure system
  • Our method is more robust overall

C. 4seasons dataset

  • 4Seasons dataset is a recent automotive dataset with a well time-synchronized visual-inertial sensor
  • Bottom 96 pixels of images are cropped off
  • IMU noise parameters are determined the same way for all methods
  • Noise models are inflated by 1, 10, 100, 1000 to determine best setting
  • Visual initializer is modified for VI-DSO and DM-VIO
  • Automotive scenario is challenging for monocular methods
  • DM-VIO outperforms stereo-inertial ORB-SLAM3 and Basalt using monocular images and no loop closures

V. conclusion and future work

  • We present a monocular visual-inertial odometry system that outperforms the state of the art, even stereo-inertial methods.
  • Our system works well in flying, handheld, and automotive scenarios.
  • The foundation of our IMU initialization is delayed marginalization, which also enables the pose graph bundle adjustment.
  • We anticipate that this method will spark further research in this direction.
  • The idea of delayed marginalization could be applied to more use cases.
  • The pose graph bundle adjustment can also be applied to long-term loop closures.
  • Our open-source system is easily extendible.
  • We provide an ablation study on different parts of the IMU initializer.
  • We provide an ablation study on the impact of the dynamic photometric weight.
  • We perform extensive runtime analysis on a Mac-Book Pro 2013.
  • The only regular overhead is the delayed marginalization, taking 0.44ms.
  • We show the mean over all 110 runs on the EuRoC dataset.
  • The dynamic weight provides a noticeable improvement in robustness.
  • DM-VIO outperforms even state-of-the-art stereo-inertial methods by a large margin.