Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Multiplane Image (MPI) is an effective and efficient representation for view synthesis from sparse inputs.
Structural MPI (S-MPI) approximates 3D scenes concisely and bridges view synthesis and 3D reconstruction.
Challenges include high-fidelity approximation, multi-view consistency, non-planar regions modeling, and efficient rendering.
Transformer-based network proposed to predict compact and expressive S-MPI layers.
Experiments show method outperforms previous state-of-the-art MPI-based view synthesis methods and planar reconstruction methods.

Paper Content

Introduction

Aim to generate new images from transformed viewpoints
Advance of neural networks drives progress
NeRF-based methods have limitations
MPI representation has superior abilities
Neural networks used to construct MPI layers
Novel views rendered in real-time
Standard MPI has underlying limitations
Sensitive to discretization and introduces redundancy
Construct MPIs adaptive to 3D scenes
End-to-end planar reconstruction from images
Introduce Structural MPI representation
End-to-end network to construct S-MPI
Global proxy embeddings encode full 3D scene

View synthesis with explicit representations
Layered Depth Image (LDI) uses several layers of depth maps and associated color values
Multiplane Image (MPI) is a popular variant of LDI
Structural MPI overcomes the MPI’s drawbacks
View synthesis with implicit representations
NeRF encodes 3D objects and scenes in the weights of an MLP
Planar 3D Approximation
Piece-wise planar depth map reconstruction is a traditional research topic
Detection-based framework generates plane segments and 3D plane parameters

Structural multiplane image

Introduces structural multiplane image representation
Geometry formulation based on standard MPI formulation

Geometry formulation

MPI consists of a collection of planes parallel to the image plane
Non-planar regions are represented as Non-planar regions
Fronto-parallel MPI is a special case of S-MPI where all planes have the same normal as the image plane
Non-planar regions are distributed into nearby fronto-parallel planes
S-MPI contains hybrid structures of geometrically faithful planes and depth-adaptive fronto-parallel planes
Final S-MPI for the complete scene is extended with N n elements with a fixed normal n z

Rendering formulation

Standard MPI has a global back-to-front rendering order of planes for each pixel
S-MPI has different rendering orders for pixels due to planes intersecting with each other
Calculate depth values for each pixel on each plane
Rearrange RGBα images with the depth order for each pixel
Render novel views by transforming plane parameters from the source view

Structural multiplane image transformer

Goal is to construct S-MPI representation for novel view synthesis and planar reconstruction
Model predicts S-MPI with appropriate number of posed planes and RGBα layers
Model inspired by neural planar reconstruction methods
Model differentiates non-planar instances according to their depth range
Model predicts proxies with structure class and visible projected mask of the plane

Single-view network

Model is built on a universal image segmentation network with two branches
One branch upsamples features and generates high-resolution per-pixel embeddings
The other branch generates N proxy embeddings at the instance level
Loss function jointly optimizes view synthesis and planar estimation
Loss function includes RGB L1 loss, SSIM loss, cross-entropy loss, focal loss, dice loss, and L1 loss

Multi-view network

Multi-view input can enlarge the range of view synthesis
Single-view methods generating planar reconstructions are not aligned in 3D space
Goal is to deliver multi-view consistent planar reconstruction
Network design uses global proxy embeddings across views
Multi-view alignment uses global plane poses to align instances in each view
RGBα embeddings and segmentation embeddings are shared globally
Image merging fuses rendered images with alpha weights as confidence maps

Experiments

S-MPI is a computer science tool that is effective for view synthesis and reconstruction from single-view and multi-view settings.
S-MPI is tested on two datasets, NYUv2 and ScanNet, and outperforms other methods.

Implementation

Ground truth labels of plane instance masks and plane parameters used from [25]
Non-planar regions sampled S-MPIs according to depth range of scene
Images resized to 256x384 for training and evaluation
ResNet50 used as backbone
Adam Optimizer used with initial learning rate of 0.0001 and weight decay of 0.05
Bootstrap training phase of 50k steps
Model trained on 4 NVIDIA-V100 GPUs for 100k steps
Multi-view input with T=2

Single-view evaluation

Planar reconstruction methods are evaluated using planar estimation metrics and segmentation metrics
Depth estimation methods are compared with planar reconstruction methods and MPI-based view synthesis methods
View synthesis methods are compared with standard MPI-based methods
Results show that the proposed method achieves better reconstruction and view synthesis performance

Multi-view evaluation

Compared PlaneMVS on plane detection and depth reconstruction accuracy
Measured plane detection by Average Precision with IoU 0.5 and varying depth error
Compared single-view planar segmentation results on ScanNet
Followed data settings from DP-NeRF to sample images sparsely and generate dense depth maps
Trained on two nearest images and compared results to MPI-based and NeRF-based methods
Outperformed MPI-based methods and achieved comparable results to NeRF-based methods
Compared rendering speed with NeRF-based method

Ablation study

Multi-view model trained with input image pairs with gap of < 20 frames and view synthesis target image < 30 frames away from middle of two input images
Global proxy embedding strategy produces better aligned images than processing multi-view images one by one
Method benefits from consistent multi-view geometry supervision and produces more accurate planar reconstruction than given two identical images
Performance drops if view gap is too large
Method performs better in scenes with more planar coverage

Conclusion

Introduction of Structural MPI (S-MPI) representation for neural view synthesis and 3D reconstruction
End-to-end model proposed to construct S-MPI
Global alignment scheme to generate aligned images for view synthesis
Limitations and future work
Need ground-truth plane segmentation and poses
Capability to use multiple parallel layers to simulate non-Lambertian effects
Figures 1-8 to illustrate S-MPI, formulation, rendering, transformer, embedding, reconstruction, synthesis, and ablation study
Quantitative view synthesis results on ScanNet

Link to paper#

Abstract#

Paper Content#

Introduction#

Related works#

Structural multiplane image#

Geometry formulation#

Rendering formulation#

Structural multiplane image transformer#

Single-view network#

Multi-view network#

Experiments#

Implementation#

Single-view evaluation#

Multi-view evaluation#

Ablation study#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Related works

Structural multiplane image

Geometry formulation

Rendering formulation

Structural multiplane image transformer

Single-view network

Multi-view network

Experiments

Implementation

Single-view evaluation

Multi-view evaluation

Ablation study

Conclusion