Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Traditional mapping methods have difficulty balancing memory consumption and accuracy.
  • This paper proposes a 3D LiDAR-based mapping method using an octree-based hierarchical structure.
  • The features are optimized with 3D measurements and a binary cross entropy loss.
  • The mapping system is designed to prevent catastrophic forgetting.
  • Experiments show that the proposed method is more accurate, complete, and memory-efficient than current methods.

Paper Content

I. introduction

  • Localization and navigation in large-scale outdoor scenes is a common task of mobile robots
  • Accurate and dense 3D map of the environment is necessary
  • Current large-scale mapping methods use spatial grids or tree structures
  • Neural network-based representations are becoming popular
  • Little has been done in the context of LiDAR data
  • SHINE-Mapping is a novel approach for large-scale incremental 3D mapping
  • SHINE-Mapping uses an octree-based sparse data structure and a shared shallow MLP
  • SHINE-Mapping is more accurate and complete than non-learning-based mapping methods
  • 3D LiDAR point clouds are used for environment mapping, localization, navigation, visualization, and augmented reality
  • Common representations for environment mapping include surfel-based, triangle meshes, and octree-based occupancy
  • Volumetric integration methods are popularized by Newcombe et al.
  • Neural representations, like NeRF, are used for novel view synthesis
  • Implicit representations represent the environment via multilayer perceptrons
  • Incremental mapping with implicit representation is a continual learning problem
  • Memory cost of dense voxel structures is reduced by octree-based sparse feature grid
  • Feature update regularization is used to achieve incremental mapping with limited memory

Iii. our approach -shine-mapping

  • Proposed framework for large-scale 3D mapping
  • Takes point clouds from range sensor (e.g. LiDAR) with known poses as input
  • Uses learnable octree-based hierarchical feature grids and MLP decoder to represent SDF of environment
  • Optimizes local feature vectors online to capture local geometry
  • Generates explicit geometric representation (triangle mesh) for visualization and evaluation

A. implicit neural map representation

  • Our implicit map representation stores spatially located features in 3D world.
  • We use a neural network to infer SDF values from these features.
  • We combine features from H different resolutions.
  • We use an octree-based map representation and hash tables to store features.
  • We use Morton codes to quickly find features of upper levels.

B. training pairs and loss function

  • Range sensors such as LiDARs provide accurate range measurements.
  • Training pairs are obtained by sampling points along the ray and using the signed distance from the sampled point to the beam endpoint as the supervision signal.
  • The regions of interest are the values close to zero as they define the surfaces.
  • Binary cross entropy (BCE) is used as the loss function, with an Eikonal term added to encourage accurate signed distance values.

C. incremental mapping without forgetting

  • Catastrophic forgetting happens when using feature grid-based implicit incremental mapping
  • Network focuses on reducing loss generated in current area, not previous area
  • As grid size increases, forgetting problem becomes more severe
  • Solution is to limit update direction of local feature vector
  • Regularization term added to loss function to prevent gradient explosion

Iv. experimental evaluation

  • Incremental and scalable 3D mapping system
  • Uses sparse hierarchical feature grid and neural decoder

A. experimental setup

  • Evaluated model on two publicly available outdoor LiDAR datasets with near ground truth mesh information
  • One dataset is a sequence of simulated LiDAR scans, the other is a non-simulated dataset with cm-level measurement noise and motion distortion
  • Evaluated mapping accuracy, completeness, and memory efficiency
  • Compared results to previous methods
  • Validated scalability for incremental mapping
  • Showcased high-fidelity 3D reconstruction indoors

B. mapping quality

  • Evaluated mapping quality in terms of accuracy and completeness
  • Compared approach against 3 other mapping systems
  • Implemented differentiable rendering
  • Used reconstruction metrics to assess results
  • Results show superiority of approach in terms of accuracy and completeness

C. memory efficiency

  • Our method creates maps with smaller memory usage than Voxblox and VDB Fusion
  • Our method maintains good mapping quality even with lower feature grid resolution, while Voxblox and VDB Fusion have significantly increased mapping error

D. scalable incremental mapping

  • SHINE-Mapping can scale to larger environments
  • Used KITTI dataset to showcase this
  • Reconstructed a driving sequence over 4 km
  • Qualitative comparison between incremental mapping with and without feature update regularization

E. indoor mapping and filling occluded areas

  • Presented a novel approach to large-scale 3D SDF mapping using range sensors
  • Used an octree-based implicit representation consisting of features stored in hash tables
  • Network and features can be learned end-to-end from range data
  • Evaluated on simulated and real-world datasets
  • Advantages over current state-of-the-art mapping systems
  • More accurate and complete 3D reconstruction with lower map memory than compared methods
  • Can provide a reasonable guess about the structure for regions not covered by the sensor