Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Signed distance function (SDF) is a promising approach for image-based surface reconstruction
- Existing optimization methods assume solid surfaces and cannot reconstruct semi-transparent surfaces and thin structures
- Neural radiance field (NeRF) based methods can model semi-transparency but cannot be easily converted into surfaces without introducing artifacts
- $\alpha$Surf is a novel surface representation with decoupled geometry and opacity for the reconstruction of semi-transparent and thin surfaces
- Ray-surface intersections can be found in closed-form via analytical solutions of cubic polynomials
- $\alpha$Surf can accurately reconstruct surfaces with semi-transparent and thin parts with fewer artifacts
Paper Content
Introduction
- Recovering surfaces from RGB images is a complex and challenging task in computer vision.
- Traditional approaches rely on Structure-from-Motion and Multi-View Stereo pipelines.
- Differentiable rendering techniques have emerged as a more versatile reconstruction procedure.
- Implicit surface representations show promising results in surface reconstruction.
- Open challenge is reconstructing implicit surfaces that exhibit semi-transparency effects.
- Modeling opaqueness is necessary for the recovery of extremely thin surfaces.
- Differentiable volume rendering techniques have demonstrated success in rendering semitransparent and thin objects.
- NeuS attempts to alleviate issue but still assumes a solid surface.
- Representation must explicitly decouple geometry and materials.
- Differentiable rendering process must consider more than just the nearest intersection.
- Introduce αSurf, a novel surface representation based on a grid structure.
- Evaluation on extended version of NeRF synthetic dataset.
- Evaluation on real-world scenes from LLFF dataset and a scene featuring semi-transparent materials.
Related works
- Neural Radiance Fields NeRF models volume density and view-dependent appearance
- Related works use NeRF for novel view synthesis, 3D asset synthesis, and efficient reconstruction and rendering
- Direct surface extraction on the density field produces artifacts
- Depth extraction method does not guarantee watertight surfaces
- UNISURF attempts to mitigate issue by transitioning from volume rendering to surface rendering
- SDF used with differentiable rendering methods to reconstruct surfaces from multi-view images
- Existing SDF optimization methods assume surface solidity and cannot reconstruct semi-transparent surfaces
Method
- Goal is to recover an implicit surface of scene objects
- Representation contains values for geometry and material properties
- Closed-form evaluation of ray-surface intersections
- Fully differentiable alpha composition to render intersection points
- Utilizes Plenoxels to initialize a coarse surface
- Optimization of photometric loss and surface regularization to reconstruct clean and accurate surfaces
Representation
- Represent surface as level sets of a continuous scalar field
- Trilinear interpolation used to obtain scalar field values
- Surface defined as level sets of scalar field
- Hyperparameter tuning used to determine level values
- Model surface opacity and view-dependent appearance in same voxel grid
- Trilinear interpolation used to obtain opacity and SH coefficients at surface locations
Differentiable rendering
- Rendering process does not involve Monte-Carlo sampling or sphere tracing
- Ray-voxel traversal used to take samples at ray-surface intersections
- Cubic polynomial used to find real roots of intersections in closed-form
- Alpha compositing used to render pixel color
Optimization
- NeRF can be used to initialize coarse surfaces
- Plenoxels is a grid-based NeRF method that can be trained quickly
- Raw level values are used to define initial surfaces
- Density is normalized to be used as initial surface scalars
- Opacity and SH field are initialized from Plenoxels
- SDF methods cannot easily take advantage of initializing from Plenoxels
- Truncated Alpha Compositing is used to regularize artifacts
- Ray intersections on backward-facing surfaces are removed
- Re-weighted sample opacity is used
- Surface convergence loss is applied to encourage different level sets to converge
- L1 normal smoothness regularization and TV loss are applied to smooth surface
- Weight-based entropy loss and L1 regularization are applied to regularize opacity field
Evaluation
- Evaluated method on extended version of NeRF synthetic dataset
- Compared with recent SDF optimization methods
- Compared with density level set and depth extracted geometry from Plenoxels and MipN-eRF360
Datasets
- 8 objects with challenging properties rendered with Blender
- 100 different training views from a full sphere
- 2 additional synthetic datasets with thin structures and semi-transparent materials
- Qualitatively evaluate on scenes from LLFF dataset
Implementation details
- Uses a sparse voxel grid of size 512 3
- Stores surface scalar δ, raw opacity σ α and 9 SH coefficients
- Initializes all grid values from Plenoxels pre-trained with provided hyperparameters
- Prunes voxels with densities σ lower than 5
- Downscales density values during initialization
- Initializes 5 level sets at τ σ for synthetic datasets
- Linearly decays truncated alpha compositing parameter a from 5 to 2 in 10k iterations
- Initializes a single level set τ σ for real-world scenes
- Trains for 50k iterations with a batch size of 5k rays
Baselines
- Compared method with state-of-the-art SDF-based and NeRF-based methods
- Compared with NeuS and HFS
- Did not include IDR, UNISURF, or VolSDF
- Compared with Plenoxels and MipNeRF 360
- Used 100 spiral views from elevation -180 to 180 to cover object and extract samples with highest weight
Evaluation on synthetic dataset
- Our method was evaluated on 3 synthetic datasets
- We used the Chamfer-L1 distance to measure performance
- HFS failed on two scenes in the Thin Blender dataset
- Our non-trimmed single surface provides better watertight surfaces than the watertight baselines
- Our trimmed representation further improves the Chamfer distance
- NeuS and HFS miss thin structures and do not perform well on semi-transparent materials
- Level set surfaces from Plenoxels and MipNeRF360 contain holes, floaters, and inner surfaces
- Our approach is capable of reconstructing those surfaces with minimal artifacts
Evaluation on real world dataset
- We show qualitative evaluation of watertight surfaces on real-world scenes
- NeuS fails to reconstruct proper surfaces
- Plenoxels can recover surfaces with high-quality details but tends to miss some surfaces and produce noisy density floaters
- Our approach initializes from Plenoxels and further optimizes with surface regularizations to reconstruct surfaces with fewer artifacts
- We present αSurf, a grid-based implicit surface representation with decoupled geometry, opacity, and appearance
- We develop closed-form intersection finding and differentiable alpha compositing to optimize the surface via photometric loss
- Our representation utilizes coarse initialization from efficient Plenoxels
- We incorporate truncated rendering and additional surface regularizations to reconstruct high-quality surfaces for semi-transparent objects and thin structures with heavy blending effects
- We determine the ray-surface intersections through the analytical solution of cubic polynomials
- We use the same delayed exponential learning rate schedule for training
- We use RMSProp optimizer for training
- We use grid search to determine the optimal hyperparameters
- We use Neumann boundary conditions when computing the surface TV loss
- We show ablation study and comparison with the Eikonal constraint regularization
- We show additional experiment details and evaluation results
- We show reconstruction results on some DTU scenes