Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Image based localization is used for autonomous vehicles, augmented reality, and robotics
Advances in deep learning have improved the precision of image based localization
Two main directions of development to increase the level of challenge for state-of-the-art visual localization algorithms: introducing more challenging datasets and reducing the pose correctness threshold
Accurate ground truth is necessary for pose correctness assessment
Proposed dataset uses 3D laser scanner data to generate query images and exact ground truth poses
Benchmarking of the proposed dataset using a visual localization pipeline
Results show that view synthesis enables generating challenging query images from viewpoints where traditional query image acquisition could not provide reliable ground truth

Several datasets have been published for large-scale visual localization
Ground truth poses for query images have been acquired in different ways
Aachen Day-Night dataset v1.1 uses view synthesis for refinement of GT poses
Proposed dataset uses view synthesis to render query images
Proposed approach enables automatic generation of query images with exact ground truth

Algorithms for image based localization can be classified into 3 categories
InLoc proposed a retrieval-based localization pipeline with feature extraction, image retrieval, dense matching and pose estimation
PCLoc added a pose correction step to the InLoc pipeline
HFNet proposed a monolithic CNN for simultaneous keypoint detection and extraction, with similar accuracy to InLoc

Laser-scanned an industrial building with inside and outside areas
Building contains repeating and absent textures
3D structure contains recurring shapes
Most similar to InLoc dataset, but covers outside areas
Laser scanning performed with Faro Focus 3D scanner
Point clouds mostly contain 27 million points each
Data structure of InLoc used for point clouds, database images, query images and supporting data
Database images sliced into 36 perspective RGBD images with 1024x768 resolution

Query image synthesis is similar to database image generation
Virtual camera position and pointing direction are randomly perturbed
3D environment is modified to reflect challenges relevant to practical pose estimation
Missing pixels are filled using an iterative, clamping based interpolation procedure
Optional last step is to modify the generated 2D view to increase the positioning challenge

338 query images were generated for benchmarking purposes
Query images have same view angle and resolution as database images
3D point cloud data was lighting-adjusted to simulate a dark environment
Occlusions were introduced to most query images
Query images range from close-to-floor level to higher altitudes
InLoc pose estimation pipeline used to acquire localization accuracy
TBPos dataset is more challenging than InLoc dataset
Impact of pose verification stage is minor
66.6% of query images have at least one database image from same point cloud scan in top-10 best candidates
Average location deviation is 0.10 m and angular deviation is 2.26° for success cases
Easier version of query image set achieved localization accuracy below 0.02 m and angular accuracy below 0.3°

InLoc pose estimation pipeline fails in Fig. 5 top-left image due to distance-based darkening.
Manual localization possible if brightness of query image is increased.
InLoc fails in top-center and bottom-center cases due to lack of texture and similar patterns in database.
InLoc fails in top-right case due to overexposure.
InLoc fails in bottom-right case despite high amount of distinct visual detail.

Proposed novel open dataset, TBPos, for image based large-scale precision localization
Adopted approach of query image synthesis to achieve exact ground truth for localization algorithm benchmarking
TBPos benchmarked using InLoc localization pipeline and compared to InLoc dataset
TBPos is significantly more challenging than InLoc
Measuring conventional localization success rate and metric precision of image-base localization
6 database images extracted from single point cloud scan
Proposed procedure for query image synthesis
Examples of TBPos query images where InLoc pose estimation pipeline fails
Number of query images synthesized for experiments presented in this work