Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposes a fully differentiable $\nabla$-RANSAC
- Predicts inlier probabilities of input data points
- Exploits predictions in a guided sampler
- Estimates model parameters and quality while propagating gradients
- Random sampler based on Gumbel Softmax sampler
- Model quality function marginalizes over scores from all models
- Unlocks end-to-end training of geometric estimation pipelines
- Trained with LoFTR to find reliable correspondences
- Tested on real-world datasets for fundamental and essential matrix estimation
- Superior to state-of-the-art in terms of accuracy and speed
Paper Content
Introduction
- Direct optimization on test-time evaluation metric has been beneficial for deep learning in vision tasks
- Training model directly on evaluation metric is infeasible when metric is nondifferentiable, so training with a surrogate of the metric is used
- Examples of surrogate losses include average precision and recall@k for image retrieval, perceptual loss for image compression, intersection-over-union loss for object detection, and edit distance loss for scene text recognition
- RANSAC is widely used for robust estimation in vision pipelines
- RANSAC variants have been proposed to improve components of the original algorithm
- ∇-RANSAC is proposed to make RANSAC end-to-end differentiable
- ∇-RANSAC allows robust estimators to use test-time evaluation metrics to optimize end-to-end training
- ∇-RANSAC is trained with a detector-free feature matcher, LoFTR, to improve accuracy
Fully differentiable ransac
- Input is set of tentative point correspondences with extra info from detector and matcher
- Consensus learning via pruning block from recent [89]
- ∇-RANSAC is an iterative random sampling of m data points
- Sampler is Gumbel Softmax Sampler using input probabilities as guidance
- Differentiable minimal solver estimates model parameters from drawn sample
- Model quality is computed in a supervised way using ground truth
Gumbel softmax sampler
- ∇-RANSAC requires sampling m data points from a set of n total samples.
- Sampling distribution is either governed by importance scores or follows a uniform distribution.
- Standard sampling operation is either non-differentiable or has sparse gradients.
- Gumbel-Softmax is extended with the straight through trick to make sampling differentiable.
Differentiable minimal solver
- Minimal solvers are a part of RANSAC-like hypothesize-and-verify approaches
- Estimate model parameters from a minimal set of data points
- Most minimal solvers are differentiable
- Fundamental and essential matrix estimation are two utmost important problems
- 8PC and 7PC solvers have a degeneracy when points stem from a close-to-planar underlying 3D structure
- 5PC solvers are used in practical applications
- Most minimal solvers return multiple solutions
- Best solution is selected based on evaluation metric
- Source codes will be made publicly available
Trainable quality function
- RANSAC calculates the quality of an estimated model as the number of inliers
- Other algorithms have improved RANSAC’s performance by better modelling noise
- Some works use soft probabilistic hypothesis selection
- Other methods combine classification loss with regression and geometry-induced losses
Training and testing details
- Input of ∇-RANSAC is a set of correspondences obtained by any feature detector and matcher
- Number of matches is fixed to 2000, best 2000 chosen based on matching score
- Missing values filled with zeros if fewer correspondences
- Local and global features extracted from correspondences by consensus learning block
- Weights initialized with 1000 epoch-long procedure to minimize Kullback-Leibler divergence
- Gradient clipping used to avoid exploding gradients and accelerate convergence
- Training pipeline implemented in PyTorch
- Inference uses state-of-the-art components and Gumbel Softmax Sampler
- MAGSAC++ model quality function used to select best model
- Inner RANSAC-based local optimization and Levenberg-Marquardt numerical optimization used to improve accuracy
- Testing algorithm implemented in C++
Experimental results
- Tested epipolar geometry estimation on 13 scenes from the CVPR IMW 2020 PhotoTourism benchmark
- Trained and validated on St. Peter’s Square with 4950 image pairs
- Compared ∇-RANSAC to classical robust estimators and state-of-the-art learning-based methods
- Retrained NG-RANSAC, CLNet, and OANet on same data
- Used SNN ratio, feature scales and orientations as learnable side-information
- Pre-filtered correspondences by SNN ratio threshold of 0.8
- Used 0.75 pixels as inlier-outlier threshold for robust estimators
Fundamental matrix estimation
Essential matrix estimation
- Evaluated method for E estimation with same train, validation, and test scenes as F estimation
- Used differentiable 5PC algorithm when training on essential matrix estimation
- Trained end-to-end for 10 epochs with weight initialization and gradient clipping
- Iteration number fixed to 100
- Evaluated estimated essential matrix by decomposing E matrix to rotation and translation, calculating errors R and t, and reporting maximum max(R, t)
- Calculated Area Under the Recall curve (AUC) thresholded at 5•, 10• and 20•
- Highest AUC scores achieved by ∇-RANSAC
- Highest AUC scores achieved by five-point algorithm, confirming necessity of using better minimal solvers than 8PC algorithm
- Weight initialization with Kullback-Leibler divergence improves accuracy
- Sampling weights used Sacre Coeur as test set
- Epipolar error leads to best results
- ∇-RANSAC can be used to improve end-to-end feature matching approaches
- Best results observed with setup where only LoFTR model is trained and ∇-RANSAC is kept frozen with pre-trained weights
- ∇-RANSAC leads to most accurate fundamental and essential matrices compared to state-of-the-art robust estimators
- Code repository and trained model will be made public