Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Neural Radiance Fields (NeRFs) enable novel view synthesis
- NeRFs represent 3D scenes for computing image rendering
- 3D meshes are the main scene representation supported by most computer graphics and simulation pipelines
- Obtaining 3D meshes from NeRFs is an open challenge
- Proposed architecture enables easy 3D surface reconstruction from any NeRF-driven approach
- Final 3D mesh is physically accurate and can be rendered in real time
Paper Content
Introduction
- Accurate 3D scene and object reconstruction is important in robotics, photogrammetry, AR/VR
- Novel view synthesis (NVS) has made advances in recent years
- Neural radiance fields (NeRFs) is a 3D representation that emits radiance
- Related work has focused on improving NeRF in terms of image quality, robustness, training speed and rendering speed
- It is unclear how to obtain accurate 3D meshes from radiance fields
- NeRFs cannot be integrated with most computer graphics pipelines
- We introduce NeRFMeshing, an end-to-end pipeline for extracting accurate meshes from trained NeRF-based networks
- Our method produces meshes with neural colors and accurate geometry that can be rendered in real time
- Our method can be used with any NeRF, enabling to incorporate new advances
- Our model preserves the high fidelity of neural radiance fields and can be used for real-time novel view synthesis
Related work
- Neural Radiance Field (NeRF) formulation introduced in [13]
- Subsequent works have addressed limitations of original approach
- Original formulation lacks accurate underlying geometry
- Our work relies on NeRF networks trained from images
- Alternative to radiance fields is to learn Signed Distance Function (SDF)
- Our method does not rely on fixed grid template during training
- Exploit adaptive power of NeRFs to robustly represent 3D scenes
- Recent approaches advance speed and geometric accuracy of NeRFs
Method
- Overview of NeRF presented in Fig. 2
- Method for approximating surface from NeRF
- Mesh extraction and real-time rendering described in Sec. 3.3
Neural radiance fields
- Neural radiance field is a continuous mapping from 3D location and ray viewing direction to RGB color and volume density
- MLP with learnable parameters is used to model the mapping
- Positional encoding of the input is used to capture high frequencies
- Camera pose is used to determine ray in world coordinate system
- Color and volume density values are composited and accumulated depth is computed at percentile k
Surface approximation from nerf
- SSAN module creates a TSDF from NeRFs
- Rely on pre-trained NeRFs for 3D approximation and priors
- Feed 3D coordinate to SSAN to predict TSDF, normal, and 8-dimensional features
- Aim to learn a TSDF approximation that is globally accurate and smooth
- Exploit NeRF occupancy to render depth
- Use median depth and percentile values to construct 3D points
- Enforce normal smoothness and constant derivatives
- Train small appearance network to predict RGB colors
- Use losses to enforce properties of TSDF
- Train SSAN from rendered depth percentiles of a pretrained NeRF model
Mesh extraction and real-time rendering
- SSAN module converts radiance field representation of scene to distance field representation
- Surface reconstruction algorithm used is PyMCubes1 implementation of marching cubes
- Texture built using per face parametrization for unbounded scenes
- Triangle mesh geometry extracted from SSAN can be encoded in common formats
- Neural view-dependent appearance added by rasterizing precomputed texture
- Average FPS of 25 on workstation and 30 on MacBook on Blender Synthetic dataset objects
Implementation details
- Uses Instant NGP architecture for SSAN module
- Divided into two branches: geometry and appearance
- Hash table size 2 19, coarsest resolution 16, highest resolution 2048, 15 levels, 2 feature dimensions per entry
- Appearance network has 4 layers of MLP with width of 32
- Uses JAX framework
- Can train and extract mesh end-to-end in less than an hour using 8 V100 NVIDIA GPUs
- Hyper-parameters n c = 10 and = 0.1
Evaluation
- Validated effectiveness of approach on synthetic and real scenes
- Easily integrated to NeRF pipelines to improve accuracy
Synthetic blender scenes
- Focused on Synthetic Blender dataset
- Compared to state of the art baselines
- Visualized mesh normals and depth absolute difference
- Meshes obtained using method have smoother, more accurate and realistic geometry
- Used vertex based feature representation and grid of size 1024
- Used higher resolution of 2048 for drums and ficus
- Measured Chamfer Distance and normal consistency
- MobileNeRF produces “triangle soup”
- Pretrained NeRFs lead to more floaters and surface not well aligned with ground truth
- Rendering quality remains high and close to ground truth
- Appropriate NeRF based backbone can improve quality of final mesh
Real unbounded scenes
- Evaluated on publicly available scenes from Mip-NeRF 360
- Used Mip-NeRF 360 accelerated with NGP from [14] as NeRF backbone architecture
- No ground truth geometry available, so evaluated using mesh extracted from foreground and background
- Our method produces more accurate geometry compared to MobileNeRF and Mip-Nerf 360
- Rendered novel views of high fidelity, rendered real-time on commodity hardware
Physics-based applications
- Method produces accurate 3D meshes
- Applications such as scene editing and physics simulation can be conducted with traditional graphics and simulation pipelines
- Fig. 1 shows simple scene editing combining meshes from different datasets
- Cloth simulation from Blender Synthetic dataset shown in Fig. 1
Conclusion
- Propose a novel approach to extract geometrically accurate meshes from NeRF based architectures
- Can be trained from any NeRF architecture without significant penalty in training time
- Rendered at high frame rates on commodity hardware
- Geometric accuracy allows for quick visualization and use in physically accurate settings