Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Introduces k-planes, a white-box model for radiance fields in arbitrary dimensions
  • Uses d choose 2 planes to represent a d-dimensional scene
  • Planar factorization makes adding dimension-specific priors easy
  • Linear feature decoder with a learned color basis yields similar performance as a nonlinear black-box MLP decoder
  • Yields competitive and often state-of-the-art reconstruction fidelity with low memory usage

Paper Content

Introduction

  • Recent interest in dynamic radiance fields requires 4D volume representations.
  • Storing a 4D volume directly is expensive due to the curse of dimensionality.
  • We propose a factorization of 4D volumes that is simple, interpretable, compact, and yields fast training and rendering.
  • We use six planes to represent a 4D volume, where the first three represent space and the last three represent space-time changes.
  • Our model uses 4 2 = 6 hexplanes in 4D and reduces to 3 2 = 3 tri-planes in 3D.
  • Our model achieves competitive performance across reconstruction quality, model size, and optimization time.
  • K-planes is an interpretable, explicit model applicable to static, varying, and dynamic scenes
  • Model size is compact and optimization time is fast
  • Model extends to arbitrary dimensions
  • Several works have used geometric representations to reduce optimization time
  • K-planes defines a unified framework for efficient and interpretable factorizations of 3D and 4D volumes
  • Applications such as VR and CT require 4D reconstruction
  • Several works have proposed extensions of NeRF to dynamic scenes
  • K-planes combines a fully explicit representation with a built-in decomposition of static and dynamic components
  • K-planes can reconstruct unbounded environments with varying appearance
  • K-planes is the first hybrid method to successfully reconstruct challenging scenes

K-planes model

  • Proposed model for representing scenes in arbitrary dimensions
  • Low memory usage and fast training and rendering
  • Factorization models a d-dimensional scene using k = d2 planes
  • For static 3D scenes, this results in tri-planes
  • For dynamic 4D scenes, this results in hex-planes
  • For 5D space, use 5 2 = 10 deca-planes

Hex-planes

  • Hex-planes factorization uses six planes
  • Each plane has shape N xN xM
  • Features of 4D coordinate q are obtained by projecting it onto the six planes
  • Features are combined using Hadamard product
  • Hadamard product allows k-planes to produce spatially localized signals
  • Hadamard product relieves feature decoder of extra task

Interpretability

  • Separation of space-only and space-time planes makes model interpretable and enables incorporation of dimension-specific priors
  • Multiscale planes used to encourage spatial smoothness and coherence
  • Total variation regularization encourages sparse gradients
  • Laplacian filter used to encourage smooth motion
  • Sparse transients used to separate space and time

Feature decoders

  • Two methods to decode feature vector into density and view-dependent color
  • Spherical harmonic decoders offer high-fidelity reconstructions and interpretability
  • Replace spherical harmonic basis functions with a learned basis
  • Linear decoder for density and MLP decoder for hybrid model
  • Extension of k-planes model to represent scenes with consistent, static geometry

Optimization details

  • Implemented NDC and โˆž version of scene contraction
  • Used proposal sampling with k-planes as density models
  • Implemented importance sampling based on temporal difference

Results

  • Experiments conducted in three domains: static scenes, dynamic scenes, and Phototourism scenes
  • Metrics used to measure results: PSNR and SSIM1
  • Training time and number of parameters reported in Table 3
  • Full per-scene results in appendix

Static scenes

  • Demonstrated triplane model on synthetic scenes from NeRF
  • Used model with four symmetric spatial resolutions and feature length M = 32
  • Explicit version matches prior state-of-the-art in terms of quality metrics
  • Hybrid version achieves slightly higher quality metrics
  • Results on unbounded, real scenes from LLFF similar to synthetic scenes

Dynamic scenes

  • Evaluated hexplane model on two dynamic scene datasets
  • D-NeRF dataset contains 8 videos of varying duration
  • DyNeRF dataset contains 6 10-second videos recorded at 30 fps
  • Both explicit and hybrid models outperform D-NeRF
  • Hexplane model naturally disentangles dynamic and static portions of the scene
  • Visualize time planes to better understand where motion occurs in a video

Variable appearance

  • Phototourism dataset is used in variable appearance experiments
  • Experiments are similar to NeRF-W
  • Test images are evaluated by optimizing per-image appearance feature on left half and computing metrics on right half
  • Interpolation in appearance code space is possible
  • 32-dimensional appearance code is sufficient to accurately capture global appearance changes

Conclusions

  • Proposed method decomposes d-dimensional space into d2 planes
  • Method can be optimized from indirect measurements
  • Scales gracefully with increasing dimension
  • Applies to 3D static scenes and 4D dynamic videos
  • Can extend to unconstrained scene reconstruction
  • Demonstrates competitive performance across varied tasks
  • Optimization time and model size scale with dimension
  • Uses multiscale bilinear interpolation
  • Features are decoded with MLP or linear decoder
  • Model optimized by minimizing reconstruction loss
  • Elementwise addition vs multiplication of plane features
  • Visual comparison of k-planes with other methods
  • Interpolation of appearance code to alter visual appearance