Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Neural Radiance Fields (NeRFs) enable novel view synthesis
  • NeRFs represent 3D scenes for computing image rendering
  • 3D meshes are the main scene representation supported by most computer graphics and simulation pipelines
  • Obtaining 3D meshes from NeRFs is an open challenge
  • Proposed architecture enables easy 3D surface reconstruction from any NeRF-driven approach
  • Final 3D mesh is physically accurate and can be rendered in real time

Paper Content

Introduction

  • Accurate 3D scene and object reconstruction is important in robotics, photogrammetry, AR/VR
  • Novel view synthesis (NVS) has made advances in recent years
  • Neural radiance fields (NeRFs) is a 3D representation that emits radiance
  • Related work has focused on improving NeRF in terms of image quality, robustness, training speed and rendering speed
  • It is unclear how to obtain accurate 3D meshes from radiance fields
  • NeRFs cannot be integrated with most computer graphics pipelines
  • We introduce NeRFMeshing, an end-to-end pipeline for extracting accurate meshes from trained NeRF-based networks
  • Our method produces meshes with neural colors and accurate geometry that can be rendered in real time
  • Our method can be used with any NeRF, enabling to incorporate new advances
  • Our model preserves the high fidelity of neural radiance fields and can be used for real-time novel view synthesis
  • Neural Radiance Field (NeRF) formulation introduced in [13]
  • Subsequent works have addressed limitations of original approach
  • Original formulation lacks accurate underlying geometry
  • Our work relies on NeRF networks trained from images
  • Alternative to radiance fields is to learn Signed Distance Function (SDF)
  • Our method does not rely on fixed grid template during training
  • Exploit adaptive power of NeRFs to robustly represent 3D scenes
  • Recent approaches advance speed and geometric accuracy of NeRFs

Method

  • Overview of NeRF presented in Fig. 2
  • Method for approximating surface from NeRF
  • Mesh extraction and real-time rendering described in Sec. 3.3

Neural radiance fields

  • Neural radiance field is a continuous mapping from 3D location and ray viewing direction to RGB color and volume density
  • MLP with learnable parameters is used to model the mapping
  • Positional encoding of the input is used to capture high frequencies
  • Camera pose is used to determine ray in world coordinate system
  • Color and volume density values are composited and accumulated depth is computed at percentile k

Surface approximation from nerf

  • SSAN module creates a TSDF from NeRFs
  • Rely on pre-trained NeRFs for 3D approximation and priors
  • Feed 3D coordinate to SSAN to predict TSDF, normal, and 8-dimensional features
  • Aim to learn a TSDF approximation that is globally accurate and smooth
  • Exploit NeRF occupancy to render depth
  • Use median depth and percentile values to construct 3D points
  • Enforce normal smoothness and constant derivatives
  • Train small appearance network to predict RGB colors
  • Use losses to enforce properties of TSDF
  • Train SSAN from rendered depth percentiles of a pretrained NeRF model

Mesh extraction and real-time rendering

  • SSAN module converts radiance field representation of scene to distance field representation
  • Surface reconstruction algorithm used is PyMCubes1 implementation of marching cubes
  • Texture built using per face parametrization for unbounded scenes
  • Triangle mesh geometry extracted from SSAN can be encoded in common formats
  • Neural view-dependent appearance added by rasterizing precomputed texture
  • Average FPS of 25 on workstation and 30 on MacBook on Blender Synthetic dataset objects

Implementation details

  • Uses Instant NGP architecture for SSAN module
  • Divided into two branches: geometry and appearance
  • Hash table size 2 19, coarsest resolution 16, highest resolution 2048, 15 levels, 2 feature dimensions per entry
  • Appearance network has 4 layers of MLP with width of 32
  • Uses JAX framework
  • Can train and extract mesh end-to-end in less than an hour using 8 V100 NVIDIA GPUs
  • Hyper-parameters n c = 10 and = 0.1

Evaluation

  • Validated effectiveness of approach on synthetic and real scenes
  • Easily integrated to NeRF pipelines to improve accuracy

Synthetic blender scenes

  • Focused on Synthetic Blender dataset
  • Compared to state of the art baselines
  • Visualized mesh normals and depth absolute difference
  • Meshes obtained using method have smoother, more accurate and realistic geometry
  • Used vertex based feature representation and grid of size 1024
  • Used higher resolution of 2048 for drums and ficus
  • Measured Chamfer Distance and normal consistency
  • MobileNeRF produces “triangle soup”
  • Pretrained NeRFs lead to more floaters and surface not well aligned with ground truth
  • Rendering quality remains high and close to ground truth
  • Appropriate NeRF based backbone can improve quality of final mesh

Real unbounded scenes

  • Evaluated on publicly available scenes from Mip-NeRF 360
  • Used Mip-NeRF 360 accelerated with NGP from [14] as NeRF backbone architecture
  • No ground truth geometry available, so evaluated using mesh extracted from foreground and background
  • Our method produces more accurate geometry compared to MobileNeRF and Mip-Nerf 360
  • Rendered novel views of high fidelity, rendered real-time on commodity hardware

Physics-based applications

  • Method produces accurate 3D meshes
  • Applications such as scene editing and physics simulation can be conducted with traditional graphics and simulation pipelines
  • Fig. 1 shows simple scene editing combining meshes from different datasets
  • Cloth simulation from Blender Synthetic dataset shown in Fig. 1

Conclusion

  • Propose a novel approach to extract geometrically accurate meshes from NeRF based architectures
  • Can be trained from any NeRF architecture without significant penalty in training time
  • Rendered at high frame rates on commodity hardware
  • Geometric accuracy allows for quick visualization and use in physically accurate settings