Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Presents I$^2$-SDF, a method for intrinsic indoor scene reconstruction and editing
- Uses differentiable Monte Carlo raytracing on neural signed distance fields
- Jointly recovers shapes, incident radiance and materials from multi-view images
- Introduces a novel bubble loss and error-guided adaptive sampling scheme
- Decomposes neural radiance field into spatially-varying material of the scene
- Demonstrates superior quality on indoor scene reconstruction, novel view synthesis, and scene editing
Paper Content
Introduction
- Reconstructing 3D scenes from multi-view images is a fundamental task in computer science
- Neural Radiance Field (NeRF) uses MLPs to approximate the underlying geometry and appearance of a 3D scene
- Novel view synthesis is insufficient for scene editing applications
- Inverse rendering or intrinsic decomposition reconstructs and decomposes the scene into shape, shading and surface reflectance
- Complex indoor scenes are difficult to reconstruct
- I 2 -SDF is a new method to decompose a 3D scene into its underlying shape, material, and incident radiance components
- Bubble loss and error-guided adaptive sampling improve reconstruction quality on small objects
- I 2 -SDF enables photorealistic indoor scene relighting and editing
- High-quality synthetic indoor scene multi-view dataset provided
Related work
- Neural implicit scene representations are used to represent 3D geometry and radiance information
- Neural radiance field (NeRF) uses a single MLP to encode a scene as a continuous volumetric field
- Follow-up works accelerate reconstruction speed using voxels, hashgrids or deep image features
- Neural fields can also be applied to represent 3D geometric functions
- Difficulties in handling shape-radiance ambiguity on texture-less surfaces
- Traditional multi-view stereo methods struggle with texture-less regions
- Learning-based MVS methods divided into two categories: depth-based and TSDF-based
- Neural implicit SDF methods used to tackle texture-less regions
- Inverse rendering attempts to reconstruct and factorize the scene with geometry, material and lighting
- Neural implicit representations used to estimate BRDF and lighting from image collections
- Recent methods mainly focus on single object reconstruction and do not handle spatially-varying lighting conditions
Overview
- Goal is to decompose shape, radiance and material of indoor scene according to multi-view input images
- Implicit representations used to model geometry, radiance and material
- Pipeline consists of neural SDF field, neural radiance field, neural material fields and emission field
- Two-stage training scheme used to avoid training ambiguities
Implicit neural surface representation and volume rendering
- Represent scene geometry as an implicit signed distance function (SDF)
- SDF maps 3D point to closest distance to surface
- Parameterize SDF and scene appearance as MLP
- Use differentiable volume rendering to learn scene implicit representation from images
- Color, depth, and normal of surface can be accumulated
Intrinsics decomposition
- Indoor scenes contain objects of different scales and visibility levels
- Existing indoor reconstruction methods often fail to recognize and reconstruct thin or suspended objects
- Neural networks tend to converge faster on low-frequency information than high-frequency information
- Gradients for small objects can vanish due to the nature of neural networks
- To address this problem, “bubbles” are inserted to create gradients for SDF near small or thin objects
- Bubble loss is used to minimize the absolute SDF value of surface points
- Importance sampling algorithm is used to filter out large planar areas and preserve small-object areas
- Geometry loss is used to approximate the geometry field
- Depth and normal priors are used to handle shape-radiance ambiguity
- Smoothness loss is used to encourage smooth surface reconstruction
Emitter semantic field
- Radiance field F c is trained from LDR images, causing under-estimation of light intensity from emitters.
- Neural emitter semantic field F e is introduced to optimize radiance value emitted from light sources.
- K-Means algorithm is used to cluster emitter points and an array L[•] is defined to model HDR emissions.
Material field
- Parameterize spatially-varying material of scene as neural field
- Use GGX microfacet BRDF model to present scene material
- Model albedo and roughness of scene with two MLPs
- Estimate material parameter associated with ray using volumetric accumulation
- Enforce physical correctness for predicted material parameters with regularizations
Differentiable monte carlo raytracing
- Scene appearance can be re-rendered using surface rendering algorithms
- Raytracing is performed in a 3D volumetric space
- Monte Carlo rendering technique is used to perform the scene re-rendering
- Surface color is rendered by Monte Carlo integration
Training
- Training of geometry and radiance fields is done in an end-to-end manner
- Loss is calculated between 3D neural representation and 2D images
- Training is done in 3 steps: warm-up, bubble, and smooth
- Training of material and emission fields is done after geometry and radiance networks are pretrained
- Material network is weakly supervised by re-rendering results
Experiments
- Analyze and compare method with state-of-the-art methods
- Demonstrate qualitative scene editing and relighting results
- Perform ablation studies to prove effectiveness
- Propose new synthetic multi-view indoor scene dataset
- Compare on mesh-based metrics and image-space geometry errors
- Compare against state-of-the-art neural reconstruction, multi-view stereo and novel view synthesis methods
- Our method outperforms all baselines
Scene editing
- Decomposition results enable photo-realistic scene editing tasks such as material editing and relighting
- Physically-based rendering algorithm produces photo-realistic lighting effects such as specular reflections
- Robustness on inaccurate depth information
- Effectiveness of adaptive sampling strategy
- Proposes I2-SDF to reconstruct an intrinsic neural scene from multi-view images
- Novel bubbling strategy recovers small objects in large-scale scenes
- Limitations include MLP-based network backbone and time-consuming MC raytracing
- Calculate PDF value corresponding to view and lighting direction
- Visualization of error-guided sampling map during training
- Outperforms baselines in novel view synthesis
- Raytracing algorithm casts shadows of inserted object
- Quantitative comparisons of novel view synthesis and geometric reconstruction results