Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Direct optimization on test-time evaluation metric has been beneficial for deep learning in vision tasks
Training model directly on evaluation metric is infeasible when metric is nondifferentiable, so training with a surrogate of the metric is used
Examples of surrogate losses include average precision and recall@k for image retrieval, perceptual loss for image compression, intersection-over-union loss for object detection, and edit distance loss for scene text recognition
RANSAC is widely used for robust estimation in vision pipelines
RANSAC variants have been proposed to improve components of the original algorithm
∇-RANSAC is proposed to make RANSAC end-to-end differentiable
∇-RANSAC allows robust estimators to use test-time evaluation metrics to optimize end-to-end training
∇-RANSAC is trained with a detector-free feature matcher, LoFTR, to improve accuracy

Input is set of tentative point correspondences with extra info from detector and matcher
Consensus learning via pruning block from recent [89]
∇-RANSAC is an iterative random sampling of m data points
Sampler is Gumbel Softmax Sampler using input probabilities as guidance
Differentiable minimal solver estimates model parameters from drawn sample
Model quality is computed in a supervised way using ground truth

∇-RANSAC requires sampling m data points from a set of n total samples.
Sampling distribution is either governed by importance scores or follows a uniform distribution.
Standard sampling operation is either non-differentiable or has sparse gradients.
Gumbel-Softmax is extended with the straight through trick to make sampling differentiable.

Minimal solvers are a part of RANSAC-like hypothesize-and-verify approaches
Estimate model parameters from a minimal set of data points
Most minimal solvers are differentiable
Fundamental and essential matrix estimation are two utmost important problems
8PC and 7PC solvers have a degeneracy when points stem from a close-to-planar underlying 3D structure
5PC solvers are used in practical applications
Most minimal solvers return multiple solutions
Best solution is selected based on evaluation metric
Source codes will be made publicly available

RANSAC calculates the quality of an estimated model as the number of inliers
Other algorithms have improved RANSAC’s performance by better modelling noise
Some works use soft probabilistic hypothesis selection
Other methods combine classification loss with regression and geometry-induced losses

Input of ∇-RANSAC is a set of correspondences obtained by any feature detector and matcher
Number of matches is fixed to 2000, best 2000 chosen based on matching score
Missing values filled with zeros if fewer correspondences
Local and global features extracted from correspondences by consensus learning block
Weights initialized with 1000 epoch-long procedure to minimize Kullback-Leibler divergence
Gradient clipping used to avoid exploding gradients and accelerate convergence
Training pipeline implemented in PyTorch
Inference uses state-of-the-art components and Gumbel Softmax Sampler
MAGSAC++ model quality function used to select best model
Inner RANSAC-based local optimization and Levenberg-Marquardt numerical optimization used to improve accuracy
Testing algorithm implemented in C++

Tested epipolar geometry estimation on 13 scenes from the CVPR IMW 2020 PhotoTourism benchmark
Trained and validated on St. Peter’s Square with 4950 image pairs
Compared ∇-RANSAC to classical robust estimators and state-of-the-art learning-based methods
Retrained NG-RANSAC, CLNet, and OANet on same data
Used SNN ratio, feature scales and orientations as learnable side-information
Pre-filtered correspondences by SNN ratio threshold of 0.8
Used 0.75 pixels as inlier-outlier threshold for robust estimators

Evaluated method for E estimation with same train, validation, and test scenes as F estimation
Used differentiable 5PC algorithm when training on essential matrix estimation
Trained end-to-end for 10 epochs with weight initialization and gradient clipping
Iteration number fixed to 100
Evaluated estimated essential matrix by decomposing E matrix to rotation and translation, calculating errors R and t, and reporting maximum max(R, t)
Calculated Area Under the Recall curve (AUC) thresholded at 5•, 10• and 20•
Highest AUC scores achieved by ∇-RANSAC
Highest AUC scores achieved by five-point algorithm, confirming necessity of using better minimal solvers than 8PC algorithm
Weight initialization with Kullback-Leibler divergence improves accuracy
Sampling weights used Sacre Coeur as test set
Epipolar error leads to best results
∇-RANSAC can be used to improve end-to-end feature matching approaches
Best results observed with setup where only LoFTR model is trained and ∇-RANSAC is kept frozen with pre-trained weights
∇-RANSAC leads to most accurate fundamental and essential matrices compared to state-of-the-art robust estimators
Code repository and trained model will be made public