Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Image based localization is a computer vision challenge
  • Datasets consist of a 3D database and query images
  • Query images and 3D database are usually acquired with different cameras
  • Ground truth poses between query images and 3D database are hard to acquire
  • This paper proposes a dataset with accurate ground truth poses
  • Dataset is evaluated with an image-based localization pipeline

Paper Content

Introduction

  • Image based localization is used for autonomous vehicles, augmented reality, and robotics
  • Advances in deep learning have improved the precision of image based localization
  • Two main directions of development to increase the level of challenge for state-of-the-art visual localization algorithms: introducing more challenging datasets and reducing the pose correctness threshold
  • Accurate ground truth is necessary for pose correctness assessment
  • Proposed dataset uses 3D laser scanner data to generate query images and exact ground truth poses
  • Benchmarking of the proposed dataset using a visual localization pipeline
  • Results show that view synthesis enables generating challenging query images from viewpoints where traditional query image acquisition could not provide reliable ground truth

Datasets

  • Several datasets have been published for large-scale visual localization
  • Ground truth poses for query images have been acquired in different ways
  • Aachen Day-Night dataset v1.1 uses view synthesis for refinement of GT poses
  • Proposed dataset uses view synthesis to render query images
  • Proposed approach enables automatic generation of query images with exact ground truth

Algorithms

  • Algorithms for image based localization can be classified into 3 categories
  • InLoc proposed a retrieval-based localization pipeline with feature extraction, image retrieval, dense matching and pose estimation
  • PCLoc added a pose correction step to the InLoc pipeline
  • HFNet proposed a monolithic CNN for simultaneous keypoint detection and extraction, with similar accuracy to InLoc

Visual data acquisition procedure

  • Laser-scanned an industrial building with inside and outside areas
  • Building contains repeating and absent textures
  • 3D structure contains recurring shapes
  • Most similar to InLoc dataset, but covers outside areas
  • Laser scanning performed with Faro Focus 3D scanner
  • Point clouds mostly contain 27 million points each
  • Data structure of InLoc used for point clouds, database images, query images and supporting data
  • Database images sliced into 36 perspective RGBD images with 1024x768 resolution

Synthesizing query images

  • Query image synthesis is similar to database image generation
  • Virtual camera position and pointing direction are randomly perturbed
  • 3D environment is modified to reflect challenges relevant to practical pose estimation
  • Missing pixels are filled using an iterative, clamping based interpolation procedure
  • Optional last step is to modify the generated 2D view to increase the positioning challenge

Experimental results

  • 338 query images were generated for benchmarking purposes
  • Query images have same view angle and resolution as database images
  • 3D point cloud data was lighting-adjusted to simulate a dark environment
  • Occlusions were introduced to most query images
  • Query images range from close-to-floor level to higher altitudes
  • InLoc pose estimation pipeline used to acquire localization accuracy
  • TBPos dataset is more challenging than InLoc dataset
  • Impact of pose verification stage is minor
  • 66.6% of query images have at least one database image from same point cloud scan in top-10 best candidates
  • Average location deviation is 0.10 m and angular deviation is 2.26° for success cases
  • Easier version of query image set achieved localization accuracy below 0.02 m and angular accuracy below 0.3°

Analysis of pose estimation failure cases

  • InLoc pose estimation pipeline fails in Fig. 5 top-left image due to distance-based darkening.
  • Manual localization possible if brightness of query image is increased.
  • InLoc fails in top-center and bottom-center cases due to lack of texture and similar patterns in database.
  • InLoc fails in top-right case due to overexposure.
  • InLoc fails in bottom-right case despite high amount of distinct visual detail.

Conclusions

  • Proposed novel open dataset, TBPos, for image based large-scale precision localization
  • Adopted approach of query image synthesis to achieve exact ground truth for localization algorithm benchmarking
  • TBPos benchmarked using InLoc localization pipeline and compared to InLoc dataset
  • TBPos is significantly more challenging than InLoc
  • Measuring conventional localization success rate and metric precision of image-base localization
  • 6 database images extracted from single point cloud scan
  • Proposed procedure for query image synthesis
  • Examples of TBPos query images where InLoc pose estimation pipeline fails
  • Number of query images synthesized for experiments presented in this work