Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Image based localization is a computer vision challenge
- Datasets consist of a 3D database and query images
- Query images and 3D database are usually acquired with different cameras
- Ground truth poses between query images and 3D database are hard to acquire
- This paper proposes a dataset with accurate ground truth poses
- Dataset is evaluated with an image-based localization pipeline
Paper Content
Introduction
- Image based localization is used for autonomous vehicles, augmented reality, and robotics
- Advances in deep learning have improved the precision of image based localization
- Two main directions of development to increase the level of challenge for state-of-the-art visual localization algorithms: introducing more challenging datasets and reducing the pose correctness threshold
- Accurate ground truth is necessary for pose correctness assessment
- Proposed dataset uses 3D laser scanner data to generate query images and exact ground truth poses
- Benchmarking of the proposed dataset using a visual localization pipeline
- Results show that view synthesis enables generating challenging query images from viewpoints where traditional query image acquisition could not provide reliable ground truth
Datasets
- Several datasets have been published for large-scale visual localization
- Ground truth poses for query images have been acquired in different ways
- Aachen Day-Night dataset v1.1 uses view synthesis for refinement of GT poses
- Proposed dataset uses view synthesis to render query images
- Proposed approach enables automatic generation of query images with exact ground truth
Algorithms
- Algorithms for image based localization can be classified into 3 categories
- InLoc proposed a retrieval-based localization pipeline with feature extraction, image retrieval, dense matching and pose estimation
- PCLoc added a pose correction step to the InLoc pipeline
- HFNet proposed a monolithic CNN for simultaneous keypoint detection and extraction, with similar accuracy to InLoc
Visual data acquisition procedure
- Laser-scanned an industrial building with inside and outside areas
- Building contains repeating and absent textures
- 3D structure contains recurring shapes
- Most similar to InLoc dataset, but covers outside areas
- Laser scanning performed with Faro Focus 3D scanner
- Point clouds mostly contain 27 million points each
- Data structure of InLoc used for point clouds, database images, query images and supporting data
- Database images sliced into 36 perspective RGBD images with 1024x768 resolution
Synthesizing query images
- Query image synthesis is similar to database image generation
- Virtual camera position and pointing direction are randomly perturbed
- 3D environment is modified to reflect challenges relevant to practical pose estimation
- Missing pixels are filled using an iterative, clamping based interpolation procedure
- Optional last step is to modify the generated 2D view to increase the positioning challenge
Experimental results
- 338 query images were generated for benchmarking purposes
- Query images have same view angle and resolution as database images
- 3D point cloud data was lighting-adjusted to simulate a dark environment
- Occlusions were introduced to most query images
- Query images range from close-to-floor level to higher altitudes
- InLoc pose estimation pipeline used to acquire localization accuracy
- TBPos dataset is more challenging than InLoc dataset
- Impact of pose verification stage is minor
- 66.6% of query images have at least one database image from same point cloud scan in top-10 best candidates
- Average location deviation is 0.10 m and angular deviation is 2.26° for success cases
- Easier version of query image set achieved localization accuracy below 0.02 m and angular accuracy below 0.3°
Analysis of pose estimation failure cases
- InLoc pose estimation pipeline fails in Fig. 5 top-left image due to distance-based darkening.
- Manual localization possible if brightness of query image is increased.
- InLoc fails in top-center and bottom-center cases due to lack of texture and similar patterns in database.
- InLoc fails in top-right case due to overexposure.
- InLoc fails in bottom-right case despite high amount of distinct visual detail.
Conclusions
- Proposed novel open dataset, TBPos, for image based large-scale precision localization
- Adopted approach of query image synthesis to achieve exact ground truth for localization algorithm benchmarking
- TBPos benchmarked using InLoc localization pipeline and compared to InLoc dataset
- TBPos is significantly more challenging than InLoc
- Measuring conventional localization success rate and metric precision of image-base localization
- 6 database images extracted from single point cloud scan
- Proposed procedure for query image synthesis
- Examples of TBPos query images where InLoc pose estimation pipeline fails
- Number of query images synthesized for experiments presented in this work