Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Pi3D dataset consists of 1000 planes observed in 10,000 images from 1DSfM dataset
- HEB dataset consists of 226,260 homographies and 4M correspondences
- Applications of Pi3D dataset include training/evaluating monocular depth, surface normal estimation and image matching algorithms
- HEB dataset used to evaluate robust estimators and deep learning-based correspondence filtering methods
Paper Content
Introduction
- Planar homography is a projective mapping between images of co-planar 3D points
- Homography encodes intrinsic and extrinsic camera parameters and parameters of the underlying 3D plane
- Homography plays an important role in multiple view geometry, calibration, metric rectification, augmented reality, optical flow, video stabilization, and Structure-from-Motion
- Traditional approach of finding homographies in image pairs consists of two stages: feature points detection and matching, and robust estimation
- Existing datasets for evaluating homography estimators are Homogr, ExtremeView, and HPatches
- Proposed dataset is Pi3D, consisting of 1046 large planes in 3D, and HEB, containing 226 260 homographies
- Dataset can be used to evaluate uncertainty of partially or fully affine covariant feature detectors
Planes in 3d dataset
- Dataset of 3D planes in scenes consisting of thousands of real-world photos
Homography evaluation benchmark
- Tentative correspondences are obtained from mutually nearest RootSIFT matches
- Input information is a set of N correspondences
- Dataset is split into two disjoint parts: training set (2 scenes) and test set (9 scenes)
- 80% of homographies have at most 0.1 inlier ratio
- Histograms of angle between translations and plane normals show all possible directions are well-covered
- 30% of homographies have fewer than 20 inliers
- Majority of homographies have fewer than 50 inliers
Experimental protocol
- Evaluation protocol largely influenced by Image Matching Benchmark
- Metrics include pose-based, ground truth correspondences-based, and self-supervised
- Main metric is mean average accuracy with thresholds
- Metrics comparison shows mostly agreement, with some exceptions
- Training and test protocols proposed for fair evaluation
- Hyper-parameters tuned on training set, then fixed for test set
- Methods for homography estimation include traditional algorithms, deep prefiltering
- Traditional algorithms include OpenCV RANSAC, LMEDS, LSQ, RHO, MAGSAC++, Graph-Cut RANSAC, scikit-image RANSAC, EAS, kornia-CPU
- Deep prefiltering uses pre-trained models for correspondence prefiltering
Experiments
- Traditional methods show the most accurate method is Affine GC-RANSAC
- PROSAC sampling improves results by up to 10 percentage points
- Optimized implementation matters for speed
- SNN ratio filtering reduces difference between methods
- RHO algorithm is fastest
- Affine GC-RANSAC benefits least from correspondence prefiltering
- LSQ fitting and LMEDS yield inaccurate results
- EAS algorithm leads to highly inaccurate results
- Deep prefiltering provides accuracy boost to advanced RANSACs
- OANet provides best results
- Vanilla OpenCV RANSAC with OANet or CLNet prefiltering performs similarly to VSAC + SNN ratio
- Uncertainty of SIFT keypoints is approx. 1/3 pixel
- STD of angular, scale, and positional transformations of detected correspondences is approx. 12°, 0.51, and 0.67 pixels respectively
Conclusion
- Pi3D and HEB datasets are presented
- Applications of the datasets are diverse, e.g. training or evaluating monocular depth, surface normal estimation and image matching algorithms
- VSAC and OANet achieve the top accuracy
- PROSAC accelerates RANSAC by an order of magnitude
- Exploiting SIFT orientation and scale has clear benefits in Affine GC-RANSAC
- Dataset, including reconstruction with absolute scale, and tools for adding new features will be made available
- Large number of homographies allows for analyzing noise in partially or fully affine-covariant features
- Evaluated DoG features
- Investigated actual noise in orientation and scaling components of such features
- Described components of each algorithm compared in main paper
- All tested methods use normalized direct linear transformation algorithm
- OpenCV RANSAC, LMEDS, LSQ, LO-RANSAC, LO-RANSAC+ with LAF, pydegensac, OpenCV GC-RANSAC and MAGSAC++, VSAC with PROSAC, RHO, CNe, ACNe, DFE, OANet, Neural Guiding, CLNet
- Evaluated bias and variance of angular, scale, and positional transformations of detected correspondences of SIFT keypoints
- Measured standard deviation for individual bins of symmetric positional residuals w.r.t. related scales
- Evaluated scale transformation accuracy and uncertainty