Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Signed distance function (SDF) is a promising approach for image-based surface reconstruction
  • Existing optimization methods assume solid surfaces and cannot reconstruct semi-transparent surfaces and thin structures
  • Neural radiance field (NeRF) based methods can model semi-transparency but cannot be easily converted into surfaces without introducing artifacts
  • $\alpha$Surf is a novel surface representation with decoupled geometry and opacity for the reconstruction of semi-transparent and thin surfaces
  • Ray-surface intersections can be found in closed-form via analytical solutions of cubic polynomials
  • $\alpha$Surf can accurately reconstruct surfaces with semi-transparent and thin parts with fewer artifacts

Paper Content

Introduction

  • Recovering surfaces from RGB images is a complex and challenging task in computer vision.
  • Traditional approaches rely on Structure-from-Motion and Multi-View Stereo pipelines.
  • Differentiable rendering techniques have emerged as a more versatile reconstruction procedure.
  • Implicit surface representations show promising results in surface reconstruction.
  • Open challenge is reconstructing implicit surfaces that exhibit semi-transparency effects.
  • Modeling opaqueness is necessary for the recovery of extremely thin surfaces.
  • Differentiable volume rendering techniques have demonstrated success in rendering semitransparent and thin objects.
  • NeuS attempts to alleviate issue but still assumes a solid surface.
  • Representation must explicitly decouple geometry and materials.
  • Differentiable rendering process must consider more than just the nearest intersection.
  • Introduce αSurf, a novel surface representation based on a grid structure.
  • Evaluation on extended version of NeRF synthetic dataset.
  • Evaluation on real-world scenes from LLFF dataset and a scene featuring semi-transparent materials.
  • Neural Radiance Fields NeRF models volume density and view-dependent appearance
  • Related works use NeRF for novel view synthesis, 3D asset synthesis, and efficient reconstruction and rendering
  • Direct surface extraction on the density field produces artifacts
  • Depth extraction method does not guarantee watertight surfaces
  • UNISURF attempts to mitigate issue by transitioning from volume rendering to surface rendering
  • SDF used with differentiable rendering methods to reconstruct surfaces from multi-view images
  • Existing SDF optimization methods assume surface solidity and cannot reconstruct semi-transparent surfaces

Method

  • Goal is to recover an implicit surface of scene objects
  • Representation contains values for geometry and material properties
  • Closed-form evaluation of ray-surface intersections
  • Fully differentiable alpha composition to render intersection points
  • Utilizes Plenoxels to initialize a coarse surface
  • Optimization of photometric loss and surface regularization to reconstruct clean and accurate surfaces

Representation

  • Represent surface as level sets of a continuous scalar field
  • Trilinear interpolation used to obtain scalar field values
  • Surface defined as level sets of scalar field
  • Hyperparameter tuning used to determine level values
  • Model surface opacity and view-dependent appearance in same voxel grid
  • Trilinear interpolation used to obtain opacity and SH coefficients at surface locations

Differentiable rendering

  • Rendering process does not involve Monte-Carlo sampling or sphere tracing
  • Ray-voxel traversal used to take samples at ray-surface intersections
  • Cubic polynomial used to find real roots of intersections in closed-form
  • Alpha compositing used to render pixel color

Optimization

  • NeRF can be used to initialize coarse surfaces
  • Plenoxels is a grid-based NeRF method that can be trained quickly
  • Raw level values are used to define initial surfaces
  • Density is normalized to be used as initial surface scalars
  • Opacity and SH field are initialized from Plenoxels
  • SDF methods cannot easily take advantage of initializing from Plenoxels
  • Truncated Alpha Compositing is used to regularize artifacts
  • Ray intersections on backward-facing surfaces are removed
  • Re-weighted sample opacity is used
  • Surface convergence loss is applied to encourage different level sets to converge
  • L1 normal smoothness regularization and TV loss are applied to smooth surface
  • Weight-based entropy loss and L1 regularization are applied to regularize opacity field

Evaluation

  • Evaluated method on extended version of NeRF synthetic dataset
  • Compared with recent SDF optimization methods
  • Compared with density level set and depth extracted geometry from Plenoxels and MipN-eRF360

Datasets

  • 8 objects with challenging properties rendered with Blender
  • 100 different training views from a full sphere
  • 2 additional synthetic datasets with thin structures and semi-transparent materials
  • Qualitatively evaluate on scenes from LLFF dataset

Implementation details

  • Uses a sparse voxel grid of size 512 3
  • Stores surface scalar δ, raw opacity σ α and 9 SH coefficients
  • Initializes all grid values from Plenoxels pre-trained with provided hyperparameters
  • Prunes voxels with densities σ lower than 5
  • Downscales density values during initialization
  • Initializes 5 level sets at τ σ for synthetic datasets
  • Linearly decays truncated alpha compositing parameter a from 5 to 2 in 10k iterations
  • Initializes a single level set τ σ for real-world scenes
  • Trains for 50k iterations with a batch size of 5k rays

Baselines

  • Compared method with state-of-the-art SDF-based and NeRF-based methods
  • Compared with NeuS and HFS
  • Did not include IDR, UNISURF, or VolSDF
  • Compared with Plenoxels and MipNeRF 360
  • Used 100 spiral views from elevation -180 to 180 to cover object and extract samples with highest weight

Evaluation on synthetic dataset

  • Our method was evaluated on 3 synthetic datasets
  • We used the Chamfer-L1 distance to measure performance
  • HFS failed on two scenes in the Thin Blender dataset
  • Our non-trimmed single surface provides better watertight surfaces than the watertight baselines
  • Our trimmed representation further improves the Chamfer distance
  • NeuS and HFS miss thin structures and do not perform well on semi-transparent materials
  • Level set surfaces from Plenoxels and MipNeRF360 contain holes, floaters, and inner surfaces
  • Our approach is capable of reconstructing those surfaces with minimal artifacts

Evaluation on real world dataset

  • We show qualitative evaluation of watertight surfaces on real-world scenes
  • NeuS fails to reconstruct proper surfaces
  • Plenoxels can recover surfaces with high-quality details but tends to miss some surfaces and produce noisy density floaters
  • Our approach initializes from Plenoxels and further optimizes with surface regularizations to reconstruct surfaces with fewer artifacts
  • We present αSurf, a grid-based implicit surface representation with decoupled geometry, opacity, and appearance
  • We develop closed-form intersection finding and differentiable alpha compositing to optimize the surface via photometric loss
  • Our representation utilizes coarse initialization from efficient Plenoxels
  • We incorporate truncated rendering and additional surface regularizations to reconstruct high-quality surfaces for semi-transparent objects and thin structures with heavy blending effects
  • We determine the ray-surface intersections through the analytical solution of cubic polynomials
  • We use the same delayed exponential learning rate schedule for training
  • We use RMSProp optimizer for training
  • We use grid search to determine the optimal hyperparameters
  • We use Neumann boundary conditions when computing the surface TV loss
  • We show ablation study and comparison with the Eikonal constraint regularization
  • We show additional experiment details and evaluation results
  • We show reconstruction results on some DTU scenes