Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Neural graphics primitives can be costly to train and evaluate.
A new input encoding is used to reduce cost without sacrificing quality.
The encoding uses a small neural network and a multiresolution hash table.
The system is implemented using fully-fused CUDA kernels.
Training is done in a matter of seconds and rendering in tens of milliseconds.

Paper Content

Introduction

Computer graphics primitives are represented by mathematical functions
Representations need to be fast and compact while capturing high-frequency detail
MLPs have been used as neural graphics primitives
Encoding maps neural network inputs to higher-dimensional space
Trainable, task-specific data structures used to enable smaller, more efficient MLPs
Multiresolution hash encoding is adaptive and efficient, independent of task
Hash tables prioritize sparse areas with most important fine scale detail
Hash table lookups are O(1) and don’t require control flow

One-hot encoding and kernel trick can be used to make complex data arrangements linearly separable
Frequency encodings have been used in computer graphics tasks such as approximating the visibility function
One-blob encoding can achieve more accurate results than frequency encodings in bounded domains

Parametric encodings.

Recently, state-of-the-art results have been achieved by parametric encodings which blur the line between classical data structures and neural approaches
Parameters are arranged in an auxiliary data structure, such as a grid or a tree
Computational cost is reduced as only a small number of parameters need to be updated for each sample
Parametric models can be trained to convergence faster without sacrificing accuracy
Dense grids of trainable features consume more memory than neural network weights
Frequency encoding allows a moderately sized network to represent a scene more accurately
Natural scenes exhibit smoothness, motivating the use of a multi-resolution decomposition
Sparse parametric encodings reduce waste by allocating fewer features to areas of empty space
Spatial hash table is used to reduce parameters while maintaining reconstruction quality
Neural network is used to disambiguate hash collisions
Hash tables have predictable memory layout and can be fine-tuned for low-level architectural details

Multiresolution hash encoding

Fully connected neural network with trainable weight and encoding parameters
Encoding parameters arranged into levels, each containing up to feature vectors with dimensionality
Hyperparameters typically shown in Table 1
Steps performed in multiresolution hash encoding illustrated in Figure 3
Feature vectors at each corner -linearly interpolated according to relative position of x within its hypercube
Performance vs. quality trade-off determined by hash table size
Implicit hash collision resolution due to different resolution levels having different strengths
Recommended = 2, = 16 as default
Interpolation ensures encoding and composition with neural network are continuous
Higher-order smoothness can be achieved with low-cost approach in Appendix A

Implementation

Implemented multiresolution hash encoding in CUDA
Source code released as an update to Müller [2021]
Hash table entries stored at half precision (2 bytes per entry)
Master copy of parameters in full precision for stable mixed-precision parameter updates
Computation scheduled to look up hash tables level by level
Performance remains roughly constant with hash table size ≤ 2^19
MLP with two hidden layers of width 64 neurons
Weights initialized according to Glorot and Bengio [2010]
Hash table entries initialized using uniform distribution U (−10^-4, 10^-4)
Jointly train neural network weights and hash table entries using Adam
L2 loss for gigapixel images and NeRF, MAPE for signed distance functions, luminance-relative L2 loss for neural radiance caching
Learning rate of 10^-4 for signed distance functions and 10^-2 otherwise
Skip Adam steps for hash table entries whose gradient is 0

Experiments

Encoding is versatile and of high quality
Compared to previous encodings
Used in four distinct computer graphics primitives
Primitives benefit from encoding spatial coordinates

Gigapixel image approximation

Learning the 2D to RGB mapping of image coordinates to colors is a popular benchmark for testing a model’s ability to represent high-frequency detail.
Adaptive coordinate networks (ACORN) have shown impressive results when fitting very large images.
Multiresolution hash encoding can achieve high-fidelity images in seconds to minutes.

Signed distance functions

Signed distance functions (SDFs) are used in many applications
DeepSDF uses a large MLP to represent one or more SDFs
Spatially learned encoding can be used with a smaller MLP
NGLOD uses a lookup from an octree of trainable feature vectors
Frequency encoding is used as a baseline
NGLOD achieves highest visual reconstruction quality
Our encoding approaches similar fidelity to NGLOD with similar performance and memory cost

Neural radiance caching

Neural Radiance Caching (NRC) is a task of predicting photorealistic pixel colors from feature buffers
MLP is run independently for each pixel, so feature buffers can be treated as per-pixel feature vectors
Multiresolution hash encoding is applied to 3D coordinate x and additional features are treated as auxiliary encoded dimensions
Visualizing the NRC at the first path vertex shows that the multiresolution hash encoding results in sharper reconstruction
Performance overhead of 0.7 ms reduces frame rate from 147 to 133 FPS at a resolution of 1920 × 1080px

Neural radiance and density fields (nerf)

NeRF setting represents a volumetric shape with a 3D density function and a 5D emission function
Model architecture consists of two concatenated MLPs: a density MLP and a color MLP
Ray marching is used to place samples near surfaces
Training and rendering can be done in seconds and at 60 FPS
Results compared to NeRF, mip-NeRF, and NSVF
Speedup attributed to hash encoding
Ablation shows MLP better at capturing specular effects and resolving hash collisions
MLP only 15% more expensive than linear layer

Mic

Our NeRF implementation with multiresolution hash encoding is competitive with NeRF, mip-NeRF, and NSVF after just 15 seconds of training.
Our method performs best on scenes with high geometric detail.
mip-NeRF and NSVF outperform our method on scenes with complex, view-dependent reflections.
Our hash encoding and smaller MLP cause a 20-60x improvement over competing implementations.
Our hash encoding allows for independent, fully parallel processing of each resolution.
We chose our hash function for its good mixture of efficiency, look-up coherence, and uniform coverage.
We prefer concatenation over reduction for the encoded result.
We experimented with three other hash functions, but none yielded a higher-quality reconstruction.
We believe the key to achieving state-of-the-art quality on SDFs with our encoding is to find a way to overcome microstructure caused by hash collisions.
We are interested in applying the multiresolution hash encoding to other low-dimensional tasks.
We have included a preliminary implementation that fits a radiance and density field directly from the noisy output of a volumetric path tracer.

Conclusion

Many graphics problems rely on task specific data structures
Our multi-resolution hash encoding provides a learning-based alternative
Low overhead allows it to be used in time-constrained settings
Speeds up NeRF by several orders of magnitude
Single-GPU training times measured in seconds
Smoothstep function used to reduce discontinuities
Offset each level by half of its voxel size to prevent zero derivatives
Octree sampling for NGLOD
Signed distance computed by triangle BVH and 32 stab rays
MLP prefixed by frequency encoding used as baseline
Ray marching step size set proportional to distance along ray
Skipping of empty space and occluded regions
Compaction of samples into dense buffers
Exponential stepping for large scenes
Warping of 3D domain not used
Occupancy grids used to skip ray marching steps

Link to paper#

Abstract#

Paper Content#

Introduction#

Background and related work#

Parametric encodings.#

Multiresolution hash encoding#

Implementation#

Experiments#

Gigapixel image approximation#

Signed distance functions#

Neural radiance caching#

Neural radiance and density fields (nerf)#

Mic#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Background and related work

Parametric encodings.

Multiresolution hash encoding

Implementation

Experiments

Gigapixel image approximation

Signed distance functions

Neural radiance caching

Neural radiance and density fields (nerf)

Mic

Conclusion