Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Recovery of scene geometry from multiview images is a challenge in computer vision research.
Recent methods leverage neural implicit surface learning and differentiable volume rendering.
Traditional multi-view stereo can recover geometry of scenes with rich textures.
HelixSurf intertwines regularization from two strategies during learning process.
HelixSurf is efficient and faster than existing methods.

Paper Content

Introduction

Challenge in computer vision research: surface reconstruction from multi-view images
Different paradigms of methods exist to address the challenge
Multi-view stereo (MVS) methods recover properties of surface points by optimizing local pixel-wise correspondences
Differentiable volume rendering connects observed images with neural modeling of implicit surface and radiance field
HelixSurf combines MVS and neural implicit surface learning to regularize learning/optimization of one strategy using the other
Regularizes learning on textureless surface areas by leveraging region-wise homogeneity of superpixels
Adaptive point sampling along rays improves efficiency of differentiable volume rendering

Patchmatch based multi-view stereo

3D reconstruction from posed multi-view images is a challenging task in computer vision
PatchMatch based Multi-view Stereo (PM-MVS) is traditionally the most explored technique
PM-MVS methods represent the geometry with depth and/or normal maps
Depth and/or normal of each pixel is estimated by exploiting inter-image photometric and geometric consistency
Filtering operations are used to fuse all the depth maps into a global point cloud
Meshing algorithms are used to recover complete surface
Traditional methods have achieved great success but can produce artifacts and missing parts in textureless areas
Deep learning-based MVS methods have demonstrated promising performance but rely on ground-truth 3D data for supervision

Neural implicit surface

Recent works use neural networks to implicitly represent surfaces
Neural networks output signed/unsigned distance fields or occupancy fields
Surface rendering is used to reconstruct 3D shapes from 2D images
Differentiable volume rendering techniques are used to eliminate the need of masks
Follow-up works improve geometry quality with fine-grained surface details
Deep networks suffer from smoothness bias which discourages them to regularize learning and recover fine details
Recent works incorporate geometric cues from pre-trained models to get rid of this dilemma
HelixSurf integrates traditional PM-MVS and neural implicit learning surface for better results

Preliminary

Neural Implicit Surface Representation is a way to encode a continuous surface as the zero-level set of a signed distance field
DeepSDF is a parameterized MLP used to represent the surface
Differentiable volume rendering is used to synthesize novel views
NeRF models a continuous scene space as a neural radiance field
Volume density is modeled as a transformed function of the implicit SDF function
Multi-View Stereo with PatchMatch is used to recover the scene geometry
PatchMatch is used to establish pixel-wise correspondences across multiview images
Color similarity and forward-backward reprojection error are used to evaluate the geometry
Probability is used for view selection and Monte-Carlo view sampling is used to draw samples

Helixsurf for intertwined regularization of neural implicit surface learning

Task is to reconstruct scene geometry with fine details
MLP based radiance field function F connects scene geometry with image observations
SDF-induced volume density used to reconstruct surface
MLP based function has deep priors that induce continuous and piece-wise, smooth surface
PatchMatch based MVS methods couple predictions of {d l , n l } for individual pixels in probabilistic framework
Propose integrated solution that takes advantages of both strategies
Iterative intertwined regularization used during learning process

Regularization of neural implicit surface learning from mvs predictions

Neural implicit surface learning samples rays in 3D space
Rays emanate from camera center and pass through a pixel in an image
Loss defines color based image supervision from ray for learning
Depth and surface normal can be computed from MVS prediction

Handling of textureless surface areas

PatchMatch based MVS methods are reliable on texture-rich surface areas.
Other sources are used to regularize neural implicit learning for textureless surface areas.
Textureless surface areas tend to be homogeneous in color and geometrically smooth.
Superpixels are used to further regularize neural implicit surface learning.

Regularization of multi-view stereo from neural implicit surface learning

Equation 3 of MVS methods optimizes depth and normal predictions.
Prior of P (d, n) is usually set as a uniformly random distribution.
HelixSurf uses depth and normal learned in current iteration as prior.
MVS methods with uniformly random distribution tend to produce noisy results with outliers.
Proposed (8) gives better results.

Improving the efficiency by establishing dynamic space occupancies

Differentiable volume rendering is computationally expensive.
A coarse-to-fine sampling strategy is used to reduce cost.
Proposed sampling scheme uses dynamic occupancies in 3D scene space to guide point sampling.
Occupancy grids of size 64x3 are used to partition 3D scene space.
Exponential moving average is used to update occupancy of voxels.
Non-occupied voxels are skipped when performing point sampling.
Scheme improves training efficiency by orders of magnitude.

Training and inference

HelixSurf is a training process that randomly samples pixels from images.
The camera rays passing through these pixels are divided into two sets: R and R.
The MLP based functions f and c are optimized using an Eikonal loss and three hyperparameters.
During inference, the marching cubes algorithm is used to extract the underlying surface from the learned SDF f.

Experiments

Experiments conducted on ScanNet and Tanks and Temples datasets
Implementation of HelixSurf in Py-Torch framework with CUDA extensions
Adam optimizer used with learning rate of 1e-3
5000 rays sampled for each iteration
Evaluation metrics for 3D reconstruction and MVS predictions

Comparisons

Reconstruction metrics comparisons on ScanNet [7]
HelixSurf surpasses existing methods in almost every metric
HelixSurf produces better details of objects than those methods using auxiliary training data
HelixSurf improves learning efficiency with orders of magnitude
HelixSurf optimized with interactive intertwined regularization
Sampling guided by dynamic occupancy grids
MVS predictions effectively promote surface learning
Leverage homogeneity inside individual superpixels to handle textureless surface areas
Adaptive Kmeans clustering algorithm to extract principal normals
Mesh-guided consistency on clustered normal maps
Adaptively guide point sampling along rays by maintaining dynamic occupancy grids
Initialize textureless surface areas with normals generated with Manhattan assumption
PatchMatch based multi-view stereo (PM-MVS) method
Ray casting technique
Textureless triangle faces pruning
Evaluation metrics: Accuracy, Completeness, Precision, Recall, and F-score
Evaluation metrics for depth and normal map
Results on ScanNet, Tanks & Temples, and DTU datasets

Link to paper#

Abstract#

Paper Content#

Introduction#

Related works#

Patchmatch based multi-view stereo#

Neural implicit surface#

Preliminary#

Helixsurf for intertwined regularization of neural implicit surface learning#

Regularization of neural implicit surface learning from mvs predictions#

Handling of textureless surface areas#

Regularization of multi-view stereo from neural implicit surface learning#

Improving the efficiency by establishing dynamic space occupancies#

Training and inference#

Experiments#

Comparisons#

Link to paper

Abstract

Paper Content

Introduction

Related works

Patchmatch based multi-view stereo

Neural implicit surface

Preliminary

Helixsurf for intertwined regularization of neural implicit surface learning

Regularization of neural implicit surface learning from mvs predictions

Handling of textureless surface areas

Regularization of multi-view stereo from neural implicit surface learning

Improving the efficiency by establishing dynamic space occupancies

Training and inference

Experiments

Comparisons