Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Neural Radiance Fields (NeRFs) have shown good results on novel view synthesis tasks.
  • NeRFs learn a scene’s color and density fields by minimizing the photometric discrepancy between training views and differentiable renders of the scene.
  • NeRFs can generate novel views from arbitrary camera positions, but can lead to artifacts when trained with few input views.
  • A denoising diffusion model (DDM) is used to learn a prior over scene geometry and color.
  • The DDM is used to predict the gradient of the logarithm of a joint probability distribution of color and depth patches.
  • During NeRF training, the estimated gradients of the log-likelihood are backpropagated to the color and density fields.
  • Evaluations show improved quality in the reconstructed geometry and improved generalization to novel views.

Paper Content

Introduction

  • Neural radiance fields, neural implicit surfaces, and coordinate-based scene representations are useful for 3D reconstruction tasks.
  • NeRFs predict density and color when given 3D point and viewing direction.
  • NeRFs can generate low-quality and physically implausible geometries and surface appearances.
  • Hand-crafted regularizers and learned priors have been proposed to tackle these issues.
  • Leveraging denoising diffusion models as a learned prior over color and geometry.
  • Geometry of scene can be modeled as a density field, occupancy field, or signed distance field
  • NeRFs use a multi-layer perceptron to represent geometry
  • Positional encoding of coordinates allows modeling of high-frequency density signals
  • Plenoxels encodes scalar opacity and spherical harmonic coefficients in a sparse voxel representation
  • Neural Sparse Voxel Fields stores feature encodings in a sparse voxel octree structure
  • MVSNeRF predicts a volume of feature encodings with 3D CNNs
  • Instant Neural Graphics Primitives uses multi-scale hash tables to store feature encodings
  • Mip-NeRF 360 proposes a density regularizer to encourage compactness of the density
  • Hand-crafted loss terms can be used to regularize NeRFs
  • RegNeRF uses a normalizing flow model as a learned prior over 2D RGB patches
  • Denoising Diffusion Models learn to estimate gradients of the log data distribution
  • DDMs have been used to learn and sample from data distributions such as images, video, and speech
  • Dreamfusion uses DDMs to guide optimization of NeRFs to match input text

Method

  • NeRF and DDM training are preliminaries
  • DDMs are related to the gradient of the log-likelihood of the data
  • DDMs are incorporated as NeRF regularizers

Nerfs

  • Optimizing a density and color field to synthesize views of a scene from arbitrary cameras
  • Estimating the expected color and depth of a ray
  • Regularizing the weights of color contributions to have a compact distribution
  • Penalizing the placement of density that is visible only from one frustum

Score functions and ddms

  • Bayes’ theorem is used to calculate the probability of density and color fields given training views
  • Stochastic gradient descent is used to maximize the probability
  • Explicit computation of the probabilities is not required
  • Forward diffusion process adds small Gaussian noise to a data sample
  • Variance controls the noise schedule
  • Reverse diffusion process is learned by training a neural network to estimate noise given a noised input and noise-level
  • Noise estimator has a connection to score matching
  • Negative direction of noise predicted by the model is equivalent to moving towards the modes of the data distribution
  • DDM is used to regularize NeRF reconstructions
  • DDM is trained using Hypersim dataset

Regularizing nerfs with ddms

  • Loss function is based on log-posterior
  • Diffusion model used as prior over (σ, c)
  • Loss function gradient is based on eq. 10 and eq. 20

Implementation details

  • Trained DDM model using training protocol of [8,39]
  • Optimized DDM for 650,000 steps with batch size 32 on 1 GPU
  • Used torch-ngp [36] implementation of Instant NGP [19] with tiny-cuda-nn [18] back-end as NeRF model
  • NeRFs optimized for 12,000 steps
  • τ smoothly interpolates from 0.1 to 0
  • λ dist linearly increases from 0 until it reaches maximum value at 8000 steps

Experiments

  • LLFF dataset has 8 scenes with 20-62 images per scene
  • DTU dataset consists of images of objects placed on a table against black background
  • Evaluations use image similarity metrics such as PSNR, SSIM and LPIPS
  • Geometry estimated by density field of NeRF is compared to ground truth point cloud using chamfer L1 distance

Evaluation on novel view synthesis task

  • Geometric baseline and model perform better than SOTA methods on LLFF dataset
  • Regularizer has a large impact on the final result when number of views is low
  • Geometric baseline has higher metrics but artifacts can be seen in generated test views
  • Diffusion model-based method generates more plausible depths
  • Test views contain parts of the scene not visible in training views
  • DDM shows good generalization to object-centric reconstruction task
  • Density-based method performs adequately compared to occupancy and SDF-based methods

Evaluation on reconstruction task

  • Density based methods struggle with shiny objects
  • Higher fidelity geometry on diffuse and textured surfaces
  • Textured regions alone are not sufficient for high quality output
  • Geometric baseline struggles to complete geometry of a house
  • DDM model provides complementary signal to geometric regularizers
  • DDM model improves DTU scores
  • DDM model introduces details in areas not pictured in training view
  • Model trained on 24x24 patches outperforms model trained on 48x48 patches
  • Feeding patches from input images to DDM 25% of the time during NeRF fitting is important
  • Reducing amount of training data for DDM slightly reduces scores
  • DDM used to regularize 3D voxel grid of densities, density weights sampled along ray, etc.
  • DDM used to regularize NeRFs, improves performance on novel view synthesis and 3D reconstruction
  • DDM can be used for other tasks optimized with gradient descent