Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Introduces k-planes, a white-box model for radiance fields in arbitrary dimensions
Uses d choose 2 planes to represent a d-dimensional scene
Planar factorization makes adding dimension-specific priors easy
Linear feature decoder with a learned color basis yields similar performance as a nonlinear black-box MLP decoder
Yields competitive and often state-of-the-art reconstruction fidelity with low memory usage

Paper Content

Introduction

Recent interest in dynamic radiance fields requires 4D volume representations.
Storing a 4D volume directly is expensive due to the curse of dimensionality.
We propose a factorization of 4D volumes that is simple, interpretable, compact, and yields fast training and rendering.
We use six planes to represent a 4D volume, where the first three represent space and the last three represent space-time changes.
Our model uses 4 2 = 6 hexplanes in 4D and reduces to 3 2 = 3 tri-planes in 3D.
Our model achieves competitive performance across reconstruction quality, model size, and optimization time.

K-planes is an interpretable, explicit model applicable to static, varying, and dynamic scenes
Model size is compact and optimization time is fast
Model extends to arbitrary dimensions
Several works have used geometric representations to reduce optimization time
K-planes defines a unified framework for efficient and interpretable factorizations of 3D and 4D volumes
Applications such as VR and CT require 4D reconstruction
Several works have proposed extensions of NeRF to dynamic scenes
K-planes combines a fully explicit representation with a built-in decomposition of static and dynamic components
K-planes can reconstruct unbounded environments with varying appearance
K-planes is the first hybrid method to successfully reconstruct challenging scenes

K-planes model

Proposed model for representing scenes in arbitrary dimensions
Low memory usage and fast training and rendering
Factorization models a d-dimensional scene using k = d2 planes
For static 3D scenes, this results in tri-planes
For dynamic 4D scenes, this results in hex-planes
For 5D space, use 5 2 = 10 deca-planes

Hex-planes

Hex-planes factorization uses six planes
Each plane has shape N xN xM
Features of 4D coordinate q are obtained by projecting it onto the six planes
Features are combined using Hadamard product
Hadamard product allows k-planes to produce spatially localized signals
Hadamard product relieves feature decoder of extra task

Interpretability

Separation of space-only and space-time planes makes model interpretable and enables incorporation of dimension-specific priors
Multiscale planes used to encourage spatial smoothness and coherence
Total variation regularization encourages sparse gradients
Laplacian filter used to encourage smooth motion
Sparse transients used to separate space and time

Feature decoders

Two methods to decode feature vector into density and view-dependent color
Spherical harmonic decoders offer high-fidelity reconstructions and interpretability
Replace spherical harmonic basis functions with a learned basis
Linear decoder for density and MLP decoder for hybrid model
Extension of k-planes model to represent scenes with consistent, static geometry

Optimization details

Implemented NDC and ∞ version of scene contraction
Used proposal sampling with k-planes as density models
Implemented importance sampling based on temporal difference

Results

Experiments conducted in three domains: static scenes, dynamic scenes, and Phototourism scenes
Metrics used to measure results: PSNR and SSIM1
Training time and number of parameters reported in Table 3
Full per-scene results in appendix

Static scenes

Demonstrated triplane model on synthetic scenes from NeRF
Used model with four symmetric spatial resolutions and feature length M = 32
Explicit version matches prior state-of-the-art in terms of quality metrics
Hybrid version achieves slightly higher quality metrics
Results on unbounded, real scenes from LLFF similar to synthetic scenes

Dynamic scenes

Evaluated hexplane model on two dynamic scene datasets
D-NeRF dataset contains 8 videos of varying duration
DyNeRF dataset contains 6 10-second videos recorded at 30 fps
Both explicit and hybrid models outperform D-NeRF
Hexplane model naturally disentangles dynamic and static portions of the scene
Visualize time planes to better understand where motion occurs in a video

Variable appearance

Phototourism dataset is used in variable appearance experiments
Experiments are similar to NeRF-W
Test images are evaluated by optimizing per-image appearance feature on left half and computing metrics on right half
Interpolation in appearance code space is possible
32-dimensional appearance code is sufficient to accurately capture global appearance changes

Conclusions

Proposed method decomposes d-dimensional space into d2 planes
Method can be optimized from indirect measurements
Scales gracefully with increasing dimension
Applies to 3D static scenes and 4D dynamic videos
Can extend to unconstrained scene reconstruction
Demonstrates competitive performance across varied tasks
Optimization time and model size scale with dimension
Uses multiscale bilinear interpolation
Features are decoded with MLP or linear decoder
Model optimized by minimizing reconstruction loss
Elementwise addition vs multiplication of plane features
Visual comparison of k-planes with other methods
Interpolation of appearance code to alter visual appearance

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

K-planes model#

Hex-planes#

Interpretability#

Feature decoders#

Optimization details#

Results#

Static scenes#

Dynamic scenes#

Variable appearance#

Conclusions#

Link to paper

Abstract

Paper Content

Introduction

Related work

K-planes model

Hex-planes

Interpretability

Feature decoders

Optimization details

Results

Static scenes

Dynamic scenes

Variable appearance

Conclusions