Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposes a new challenge for synthesizing a novel view in a practical environment with limited input images and significant illumination variations
- Suggests ExtremeNeRF to address the problem, which utilizes occlusion-aware multiview albedo consistency, supported by geometric alignment and depth consistency
- Extracts intrinsic image components that are illumination-invariant across different views
- Provides extensive experimental results for an evaluation of the task using the newly built NeRF Extreme benchmark
Paper Content
Introduction
- Neural radiance fields (NeRF) have had significant impacts on 3D scene reconstruction and novel view synthesis
- Various aspects of NeRF have been improved, such as generalization ability, representation ability, and practicality
- ExtremeNeRF provides reliable novel view synthesis results compared to the state-of-the-art method
- NeRF often struggles to render large-size patches due to complexity
- ExtremeNeRF enforces consistency among intrinsic components between input and rendered views
- A new NeRF Extreme dataset has been proposed, which is the first in-the-wild multi-view dataset with both indoor and outdoor scenes taken under varying illumination
- ExtremeNeRF provides plausible few-shot view synthesis and video rendering results with fewer inputs and varying illumination
Related work
- Few-shot NeRF leverages knowledge priors to improve performance
- Depth and geometry priors are used
- NeRF-W enables view synthesis of internet photos taken under varying illumination
- Tancik et al. presented a way to synthesize a large-scale city scene captured under varying lighting conditions
- Existing datasets focus on varying single attributes like viewing direction or illumination
- Li et al. disentangled intrinsic components without supervision
- Ye et al. suggested a NeRF framework for scene editing
Preliminaries
- Neural radiance fields (NeRF) is a volume rendering-based view-synthesis framework
- NeRF maps 5D inputs to color and volume density
- Vanila-NeRF view synthesis is done by optimizing mean squared error on synthesized color
- RegNeRF uses a depth smoothness regularization for few-shot view synthesis
- RegNeRF relies on the assumption that input images should share consistent illumination conditions
Methodology
Overview
- Objective: Build illumination-robust few-shot view synthesis framework
- Challenges: Geometric alignment, occluded regions, global context
- Solution: Offline intrinsic decomposition network, pseudoalbedo ground truth
Intrinsic consistency regularization
- Two images are randomly selected from a set of inputs and novel views for each iteration
- A projective transformation is used to get a pixel correspondence between the two images
- Albedo consistency is imposed between inputs and novel views
- Occlusion handling is used to minimize projection errors
- Depth consistency loss is used to regularize the scene geometry and reduce floating artifacts
Albedo estimation
- Intrinsic decomposition requires global context of a scene
- NeRF struggles with large-resolution inputs
- Intrinsic rendering methods cannot be used due to lack of input data
- Proposed two-stage intrinsic decomposition pipeline: FIDNet and PIDNet
- FIDNet extracts intrinsic components of input images offline
- PIDNet extracts patch-wise intrinsic components of synthesized color patch at novel view
Total loss functions
- Edge-preserving loss ensures gradients of the novel view are the same as the input view
- Intrinsic smoothness loss and depth consistency loss are formulated in the same way
- Chromaticity consistency loss ensures the chromaticity of the input patch and the extracted albedo are the same
Experiments
- Introducing NeRF Extreme, a multi-view dataset with varying illumination
- Scenes are not limited to object-centric ones
- Details of dataset statistics and experimental settings in supplementary material
Nerf extreme dataset
- Collected multi-view images with a variety of light sources
- Varied illumination by turn-on/off light sources and closing/opening curtains
- Captured outdoor scenes at different times with different sunlights
- Images taken in the wild using off-the-shelves camera on mobile phone
- Camera poses obtained using COLMAP structure-from-motion framework
- Depth maps obtained by multi-view stereo method
Experimental settings
- Framework based on JAX and RegNeRF
- Used IIDWW code and model without fine-tuning
- Experiments on DTU, NeRD, LLFF, and NeRF Extreme datasets
- Compared to mip-NeRF and RegNeRF for few-shot view synthesis
- Weak comparisons with NeRF-W, NeROIC, and other works
Evaluation metrics
- Problem of evaluating novel view synthesis under varying illumination is ill-posed
- Synthesizing scene appearances is impossible with input information
- Evaluation methods in addition to PSNR and SSIM to compare underlying characteristics of scene regardless of illumination
- Metric to measure maintenance of consistent color from synthesized image to inputs
- Absolute Relative Error to compare quality of synthesized depth
Experimental results
- Proposed ExtremeNeRF outperforms other few-shot view synthesis baselines on NeRF Extreme and light-varying DTU datasets
- ExtremeNeRF eliminates distortions from varying illumination inputs
- Cross-view consistency regularization helps maintain underlying color consistency and depth map
- Albedo consistency regularization helps regularize depth
- Occlusion handling prevents consistency regularization between pixels with occlusions
- Results prove ExtremeNeRF can maintain physical properties of target scene with sparse inputs and varying illumination
- Quantitative comparison shows ExtremeNeRF has lowest CCRD and Abs Rel
- Ablation studies show depth consistency term is beneficial for color and depth map
- NeROIC-Geom and NeROIC-Full show over-smoothed or diverged results on NeRF Extreme dataset
- Relighting results achieved by replacing shading image of synthesized scene
- Tent scene results show difficulties in regularizing intrinsic components
- Additional qualitative comparisons show ExtremeNeRF performs better than other methods in synthesizing depth