Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Recovery of scene geometry from multiview images is a challenge in computer vision research.
  • Recent methods leverage neural implicit surface learning and differentiable volume rendering.
  • Traditional multi-view stereo can recover geometry of scenes with rich textures.
  • HelixSurf intertwines regularization from two strategies during learning process.
  • HelixSurf is efficient and faster than existing methods.

Paper Content

Introduction

  • Challenge in computer vision research: surface reconstruction from multi-view images
  • Different paradigms of methods exist to address the challenge
  • Multi-view stereo (MVS) methods recover properties of surface points by optimizing local pixel-wise correspondences
  • Differentiable volume rendering connects observed images with neural modeling of implicit surface and radiance field
  • HelixSurf combines MVS and neural implicit surface learning to regularize learning/optimization of one strategy using the other
  • Regularizes learning on textureless surface areas by leveraging region-wise homogeneity of superpixels
  • Adaptive point sampling along rays improves efficiency of differentiable volume rendering

Patchmatch based multi-view stereo

  • 3D reconstruction from posed multi-view images is a challenging task in computer vision
  • PatchMatch based Multi-view Stereo (PM-MVS) is traditionally the most explored technique
  • PM-MVS methods represent the geometry with depth and/or normal maps
  • Depth and/or normal of each pixel is estimated by exploiting inter-image photometric and geometric consistency
  • Filtering operations are used to fuse all the depth maps into a global point cloud
  • Meshing algorithms are used to recover complete surface
  • Traditional methods have achieved great success but can produce artifacts and missing parts in textureless areas
  • Deep learning-based MVS methods have demonstrated promising performance but rely on ground-truth 3D data for supervision

Neural implicit surface

  • Recent works use neural networks to implicitly represent surfaces
  • Neural networks output signed/unsigned distance fields or occupancy fields
  • Surface rendering is used to reconstruct 3D shapes from 2D images
  • Differentiable volume rendering techniques are used to eliminate the need of masks
  • Follow-up works improve geometry quality with fine-grained surface details
  • Deep networks suffer from smoothness bias which discourages them to regularize learning and recover fine details
  • Recent works incorporate geometric cues from pre-trained models to get rid of this dilemma
  • HelixSurf integrates traditional PM-MVS and neural implicit learning surface for better results

Preliminary

  • Neural Implicit Surface Representation is a way to encode a continuous surface as the zero-level set of a signed distance field
  • DeepSDF is a parameterized MLP used to represent the surface
  • Differentiable volume rendering is used to synthesize novel views
  • NeRF models a continuous scene space as a neural radiance field
  • Volume density is modeled as a transformed function of the implicit SDF function
  • Multi-View Stereo with PatchMatch is used to recover the scene geometry
  • PatchMatch is used to establish pixel-wise correspondences across multiview images
  • Color similarity and forward-backward reprojection error are used to evaluate the geometry
  • Probability is used for view selection and Monte-Carlo view sampling is used to draw samples

Helixsurf for intertwined regularization of neural implicit surface learning

  • Task is to reconstruct scene geometry with fine details
  • MLP based radiance field function F connects scene geometry with image observations
  • SDF-induced volume density used to reconstruct surface
  • MLP based function has deep priors that induce continuous and piece-wise, smooth surface
  • PatchMatch based MVS methods couple predictions of {d l , n l } for individual pixels in probabilistic framework
  • Propose integrated solution that takes advantages of both strategies
  • Iterative intertwined regularization used during learning process

Regularization of neural implicit surface learning from mvs predictions

  • Neural implicit surface learning samples rays in 3D space
  • Rays emanate from camera center and pass through a pixel in an image
  • Loss defines color based image supervision from ray for learning
  • Depth and surface normal can be computed from MVS prediction

Handling of textureless surface areas

  • PatchMatch based MVS methods are reliable on texture-rich surface areas.
  • Other sources are used to regularize neural implicit learning for textureless surface areas.
  • Textureless surface areas tend to be homogeneous in color and geometrically smooth.
  • Superpixels are used to further regularize neural implicit surface learning.

Regularization of multi-view stereo from neural implicit surface learning

  • Equation 3 of MVS methods optimizes depth and normal predictions.
  • Prior of P (d, n) is usually set as a uniformly random distribution.
  • HelixSurf uses depth and normal learned in current iteration as prior.
  • MVS methods with uniformly random distribution tend to produce noisy results with outliers.
  • Proposed (8) gives better results.

Improving the efficiency by establishing dynamic space occupancies

  • Differentiable volume rendering is computationally expensive.
  • A coarse-to-fine sampling strategy is used to reduce cost.
  • Proposed sampling scheme uses dynamic occupancies in 3D scene space to guide point sampling.
  • Occupancy grids of size 64x3 are used to partition 3D scene space.
  • Exponential moving average is used to update occupancy of voxels.
  • Non-occupied voxels are skipped when performing point sampling.
  • Scheme improves training efficiency by orders of magnitude.

Training and inference

  • HelixSurf is a training process that randomly samples pixels from images.
  • The camera rays passing through these pixels are divided into two sets: R and R.
  • The MLP based functions f and c are optimized using an Eikonal loss and three hyperparameters.
  • During inference, the marching cubes algorithm is used to extract the underlying surface from the learned SDF f.

Experiments

  • Experiments conducted on ScanNet and Tanks and Temples datasets
  • Implementation of HelixSurf in Py-Torch framework with CUDA extensions
  • Adam optimizer used with learning rate of 1e-3
  • 5000 rays sampled for each iteration
  • Evaluation metrics for 3D reconstruction and MVS predictions

Comparisons

  • Reconstruction metrics comparisons on ScanNet [7]
  • HelixSurf surpasses existing methods in almost every metric
  • HelixSurf produces better details of objects than those methods using auxiliary training data
  • HelixSurf improves learning efficiency with orders of magnitude
  • HelixSurf optimized with interactive intertwined regularization
  • Sampling guided by dynamic occupancy grids
  • MVS predictions effectively promote surface learning
  • Leverage homogeneity inside individual superpixels to handle textureless surface areas
  • Adaptive Kmeans clustering algorithm to extract principal normals
  • Mesh-guided consistency on clustered normal maps
  • Adaptively guide point sampling along rays by maintaining dynamic occupancy grids
  • Initialize textureless surface areas with normals generated with Manhattan assumption
  • PatchMatch based multi-view stereo (PM-MVS) method
  • Ray casting technique
  • Textureless triangle faces pruning
  • Evaluation metrics: Accuracy, Completeness, Precision, Recall, and F-score
  • Evaluation metrics for depth and normal map
  • Results on ScanNet, Tanks & Temples, and DTU datasets