Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Introduces k-planes, a white-box model for radiance fields in arbitrary dimensions
- Uses d choose 2 planes to represent a d-dimensional scene
- Planar factorization makes adding dimension-specific priors easy
- Linear feature decoder with a learned color basis yields similar performance as a nonlinear black-box MLP decoder
- Yields competitive and often state-of-the-art reconstruction fidelity with low memory usage
Paper Content
Introduction
- Recent interest in dynamic radiance fields requires 4D volume representations.
- Storing a 4D volume directly is expensive due to the curse of dimensionality.
- We propose a factorization of 4D volumes that is simple, interpretable, compact, and yields fast training and rendering.
- We use six planes to represent a 4D volume, where the first three represent space and the last three represent space-time changes.
- Our model uses 4 2 = 6 hexplanes in 4D and reduces to 3 2 = 3 tri-planes in 3D.
- Our model achieves competitive performance across reconstruction quality, model size, and optimization time.
Related work
- K-planes is an interpretable, explicit model applicable to static, varying, and dynamic scenes
- Model size is compact and optimization time is fast
- Model extends to arbitrary dimensions
- Several works have used geometric representations to reduce optimization time
- K-planes defines a unified framework for efficient and interpretable factorizations of 3D and 4D volumes
- Applications such as VR and CT require 4D reconstruction
- Several works have proposed extensions of NeRF to dynamic scenes
- K-planes combines a fully explicit representation with a built-in decomposition of static and dynamic components
- K-planes can reconstruct unbounded environments with varying appearance
- K-planes is the first hybrid method to successfully reconstruct challenging scenes
K-planes model
- Proposed model for representing scenes in arbitrary dimensions
- Low memory usage and fast training and rendering
- Factorization models a d-dimensional scene using k = d2 planes
- For static 3D scenes, this results in tri-planes
- For dynamic 4D scenes, this results in hex-planes
- For 5D space, use 5 2 = 10 deca-planes
Hex-planes
- Hex-planes factorization uses six planes
- Each plane has shape N xN xM
- Features of 4D coordinate q are obtained by projecting it onto the six planes
- Features are combined using Hadamard product
- Hadamard product allows k-planes to produce spatially localized signals
- Hadamard product relieves feature decoder of extra task
Interpretability
- Separation of space-only and space-time planes makes model interpretable and enables incorporation of dimension-specific priors
- Multiscale planes used to encourage spatial smoothness and coherence
- Total variation regularization encourages sparse gradients
- Laplacian filter used to encourage smooth motion
- Sparse transients used to separate space and time
Feature decoders
- Two methods to decode feature vector into density and view-dependent color
- Spherical harmonic decoders offer high-fidelity reconstructions and interpretability
- Replace spherical harmonic basis functions with a learned basis
- Linear decoder for density and MLP decoder for hybrid model
- Extension of k-planes model to represent scenes with consistent, static geometry
Optimization details
- Implemented NDC and โ version of scene contraction
- Used proposal sampling with k-planes as density models
- Implemented importance sampling based on temporal difference
Results
- Experiments conducted in three domains: static scenes, dynamic scenes, and Phototourism scenes
- Metrics used to measure results: PSNR and SSIM1
- Training time and number of parameters reported in Table 3
- Full per-scene results in appendix
Static scenes
- Demonstrated triplane model on synthetic scenes from NeRF
- Used model with four symmetric spatial resolutions and feature length M = 32
- Explicit version matches prior state-of-the-art in terms of quality metrics
- Hybrid version achieves slightly higher quality metrics
- Results on unbounded, real scenes from LLFF similar to synthetic scenes
Dynamic scenes
- Evaluated hexplane model on two dynamic scene datasets
- D-NeRF dataset contains 8 videos of varying duration
- DyNeRF dataset contains 6 10-second videos recorded at 30 fps
- Both explicit and hybrid models outperform D-NeRF
- Hexplane model naturally disentangles dynamic and static portions of the scene
- Visualize time planes to better understand where motion occurs in a video
Variable appearance
- Phototourism dataset is used in variable appearance experiments
- Experiments are similar to NeRF-W
- Test images are evaluated by optimizing per-image appearance feature on left half and computing metrics on right half
- Interpolation in appearance code space is possible
- 32-dimensional appearance code is sufficient to accurately capture global appearance changes
Conclusions
- Proposed method decomposes d-dimensional space into d2 planes
- Method can be optimized from indirect measurements
- Scales gracefully with increasing dimension
- Applies to 3D static scenes and 4D dynamic videos
- Can extend to unconstrained scene reconstruction
- Demonstrates competitive performance across varied tasks
- Optimization time and model size scale with dimension
- Uses multiscale bilinear interpolation
- Features are decoded with MLP or linear decoder
- Model optimized by minimizing reconstruction loss
- Elementwise addition vs multiplication of plane features
- Visual comparison of k-planes with other methods
- Interpolation of appearance code to alter visual appearance