Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Neural Radiance Fields (NeRFs) have shown good results on novel view synthesis tasks.
- NeRFs learn a scene’s color and density fields by minimizing the photometric discrepancy between training views and differentiable renders of the scene.
- NeRFs can generate novel views from arbitrary camera positions, but can lead to artifacts when trained with few input views.
- A denoising diffusion model (DDM) is used to learn a prior over scene geometry and color.
- The DDM is used to predict the gradient of the logarithm of a joint probability distribution of color and depth patches.
- During NeRF training, the estimated gradients of the log-likelihood are backpropagated to the color and density fields.
- Evaluations show improved quality in the reconstructed geometry and improved generalization to novel views.
Paper Content
Introduction
- Neural radiance fields, neural implicit surfaces, and coordinate-based scene representations are useful for 3D reconstruction tasks.
- NeRFs predict density and color when given 3D point and viewing direction.
- NeRFs can generate low-quality and physically implausible geometries and surface appearances.
- Hand-crafted regularizers and learned priors have been proposed to tackle these issues.
- Leveraging denoising diffusion models as a learned prior over color and geometry.
Related work
- Geometry of scene can be modeled as a density field, occupancy field, or signed distance field
- NeRFs use a multi-layer perceptron to represent geometry
- Positional encoding of coordinates allows modeling of high-frequency density signals
- Plenoxels encodes scalar opacity and spherical harmonic coefficients in a sparse voxel representation
- Neural Sparse Voxel Fields stores feature encodings in a sparse voxel octree structure
- MVSNeRF predicts a volume of feature encodings with 3D CNNs
- Instant Neural Graphics Primitives uses multi-scale hash tables to store feature encodings
- Mip-NeRF 360 proposes a density regularizer to encourage compactness of the density
- Hand-crafted loss terms can be used to regularize NeRFs
- RegNeRF uses a normalizing flow model as a learned prior over 2D RGB patches
- Denoising Diffusion Models learn to estimate gradients of the log data distribution
- DDMs have been used to learn and sample from data distributions such as images, video, and speech
- Dreamfusion uses DDMs to guide optimization of NeRFs to match input text
Method
- NeRF and DDM training are preliminaries
- DDMs are related to the gradient of the log-likelihood of the data
- DDMs are incorporated as NeRF regularizers
Nerfs
- Optimizing a density and color field to synthesize views of a scene from arbitrary cameras
- Estimating the expected color and depth of a ray
- Regularizing the weights of color contributions to have a compact distribution
- Penalizing the placement of density that is visible only from one frustum
Score functions and ddms
- Bayes’ theorem is used to calculate the probability of density and color fields given training views
- Stochastic gradient descent is used to maximize the probability
- Explicit computation of the probabilities is not required
- Forward diffusion process adds small Gaussian noise to a data sample
- Variance controls the noise schedule
- Reverse diffusion process is learned by training a neural network to estimate noise given a noised input and noise-level
- Noise estimator has a connection to score matching
- Negative direction of noise predicted by the model is equivalent to moving towards the modes of the data distribution
- DDM is used to regularize NeRF reconstructions
- DDM is trained using Hypersim dataset
Regularizing nerfs with ddms
- Loss function is based on log-posterior
- Diffusion model used as prior over (σ, c)
- Loss function gradient is based on eq. 10 and eq. 20
Implementation details
- Trained DDM model using training protocol of [8,39]
- Optimized DDM for 650,000 steps with batch size 32 on 1 GPU
- Used torch-ngp [36] implementation of Instant NGP [19] with tiny-cuda-nn [18] back-end as NeRF model
- NeRFs optimized for 12,000 steps
- τ smoothly interpolates from 0.1 to 0
- λ dist linearly increases from 0 until it reaches maximum value at 8000 steps
Experiments
- LLFF dataset has 8 scenes with 20-62 images per scene
- DTU dataset consists of images of objects placed on a table against black background
- Evaluations use image similarity metrics such as PSNR, SSIM and LPIPS
- Geometry estimated by density field of NeRF is compared to ground truth point cloud using chamfer L1 distance
Evaluation on novel view synthesis task
- Geometric baseline and model perform better than SOTA methods on LLFF dataset
- Regularizer has a large impact on the final result when number of views is low
- Geometric baseline has higher metrics but artifacts can be seen in generated test views
- Diffusion model-based method generates more plausible depths
- Test views contain parts of the scene not visible in training views
- DDM shows good generalization to object-centric reconstruction task
- Density-based method performs adequately compared to occupancy and SDF-based methods
Evaluation on reconstruction task
- Density based methods struggle with shiny objects
- Higher fidelity geometry on diffuse and textured surfaces
- Textured regions alone are not sufficient for high quality output
- Geometric baseline struggles to complete geometry of a house
- DDM model provides complementary signal to geometric regularizers
- DDM model improves DTU scores
- DDM model introduces details in areas not pictured in training view
- Model trained on 24x24 patches outperforms model trained on 48x48 patches
- Feeding patches from input images to DDM 25% of the time during NeRF fitting is important
- Reducing amount of training data for DDM slightly reduces scores
- DDM used to regularize 3D voxel grid of densities, density weights sampled along ray, etc.
- DDM used to regularize NeRFs, improves performance on novel view synthesis and 3D reconstruction
- DDM can be used for other tasks optimized with gradient descent