Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposed framework, InfiniCity, constructs and renders an unconstrainedly large and 3D-grounded environment from random noises
- Decomposed into three modules: 2D map synthesis, 3D octree completion, and voxel-based neural rendering
- Synthesizes arbitrary-scale and traversable 3D city environments
- Allows flexible and interactive editing from users
Paper Content
Introduction
- Rapid evolution in generative modeling research
- Generators can synthesize high-quality images, 3D content and videos
- Most works focus on bounded space
- Recent attempts to achieve infinite visual synthesis with neural implicit model
- Take city scenes as a case study
- Synthesizing 3D environment broken down into stages of global structure planning and local perfection
- Proposed InfiniCity pipeline for infinite-scale 3D city scene generation
- Framework makes best use of both 2D and 3D data
- Interactive sampling GUI for fast and flexible user interaction
Related work
- Attempts to generate infinite environments using finite images
- Divide-and-conquer strategy used to generate small patches
- Autoregressive and non-autoregressive inference processes used
- Generate 3D-grounded traversable environment of infinite scale
- Leverage explicit 3D supervision to learn geometry of 3D environment
- Octree used as 3D representation
- Learn 3D structure from 2D image collections
- GAN-based framework used for texturization
Infinicity
- Generate infinite-scale 3D city scenes using 2D and 3D data
- InfiniCity synthesis pipeline consists of three components
- First component generates arbitrarily large satellite map from random noises
- Second component converts map into watertight voxel environment
- Third component texturizes voxel world
Data preprocessing
- Dataset consists of images with GPS-registered camera poses, CAD model
- Data is processed for 3 modules: octree-based voxel completion, bird’s-eye view scan, street-view render
- Octree-based voxel completion: CAD model converted to set of octrees, surface octrees extracted into 2D images
- Street-view render: GPS-registered camera location and annotated camera orientation used to render segmentation images
Infinite 2d map synthesis
- Generating 3D environments directly is currently not possible
- We propose to start by synthesizing the corresponding 2D map
- Leverage the infinite-pixel image synthesis ability of InfinityGAN
- Generate categorical labels instead of real RGB satellite images
- Model height map and surface normal vector to regularize structural plausibility
- Apply contrastive patch discriminator to increase importance of fine-grained details
- Synthesize tuples of images of arbitrary scale
Voxel world completion
- Model ensures final voxel structure is watertight and maintains original voxel surfaces
- Adopt PVD as a critical baseline
- Measure distribution distance similar to FID using an autoencoder
- Outperforms PVD in evaluation setting
- Pillar method creates undesired appearances for certain object classes
- Synthesizing structure from satellite view simplifies and benefits structure synthesis
- Bilateral filtering improves plausibility of structure and suppresses noises
Texturization via neural rendering
- Our method is the first attempt to generate infinite-scale 3D environments using 2D and 3D data.
- We compare our method with GSN to illustrate the advantages of using 3D data.
- Results show 3D consistency of the 3D structure.
- GSN fails to learn the appearance of the city and its latent space fails to understand 3D information.
- Quantitative evaluation shows InfiniCity substantially outperforms GSN.
Interactive sampling gui
- Generative models have difficulty maintaining consistent quality over large images.
- Artifacts can occur, such as bridges suddenly terminating in the middle of water.
- An interactive sampling GUI is developed to give imperfect images a second chance.
Experiments
Dataset processing
- InfiniCity algorithm extracts data modalities
- HoliCity is a large-scale dataset based on 3D London CAD model
- Dataset contains 50,024 images registered to CAD model
- Subset of CAD model used to train and evaluate algorithm
- Point sampling and voxel resolution used to partition space
- Octree created with 64 voxels on each edge
- Voxels scanned and projected onto 2D image
Infinite 2d map generation
- Synthesized satellite images with method across categorical, depth, and normal modalities
- User can interactively resample local latent variables within sub-region of map
- Contrastive patch discriminator improves generator quality
Conclusion
- Propose InfiniCity, a novel framework for unbounded 3D environment synthesis
- Produces high-quality and high-diversity results
- Plausible, traversable, easily editable structures at an infinite scale
- Quality of final rendering is bounded by neural rendering
- InfiniCity consists of three major modules
- Interactive resampling allows users to select region of interest
- Synthesized satellite maps with contrastive discriminator
- Octree-based voxel completion
- Voxel-based neural rendering
- Trajectory-wise image rendering results show better quality, structural consistency, and diversity
- Traversable and consistent 3D city rendering