Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Proposed framework, InfiniCity, constructs and renders an unconstrainedly large and 3D-grounded environment from random noises
  • Decomposed into three modules: 2D map synthesis, 3D octree completion, and voxel-based neural rendering
  • Synthesizes arbitrary-scale and traversable 3D city environments
  • Allows flexible and interactive editing from users

Paper Content

Introduction

  • Rapid evolution in generative modeling research
  • Generators can synthesize high-quality images, 3D content and videos
  • Most works focus on bounded space
  • Recent attempts to achieve infinite visual synthesis with neural implicit model
  • Take city scenes as a case study
  • Synthesizing 3D environment broken down into stages of global structure planning and local perfection
  • Proposed InfiniCity pipeline for infinite-scale 3D city scene generation
  • Framework makes best use of both 2D and 3D data
  • Interactive sampling GUI for fast and flexible user interaction
  • Attempts to generate infinite environments using finite images
  • Divide-and-conquer strategy used to generate small patches
  • Autoregressive and non-autoregressive inference processes used
  • Generate 3D-grounded traversable environment of infinite scale
  • Leverage explicit 3D supervision to learn geometry of 3D environment
  • Octree used as 3D representation
  • Learn 3D structure from 2D image collections
  • GAN-based framework used for texturization

Infinicity

  • Generate infinite-scale 3D city scenes using 2D and 3D data
  • InfiniCity synthesis pipeline consists of three components
  • First component generates arbitrarily large satellite map from random noises
  • Second component converts map into watertight voxel environment
  • Third component texturizes voxel world

Data preprocessing

  • Dataset consists of images with GPS-registered camera poses, CAD model
  • Data is processed for 3 modules: octree-based voxel completion, bird’s-eye view scan, street-view render
  • Octree-based voxel completion: CAD model converted to set of octrees, surface octrees extracted into 2D images
  • Street-view render: GPS-registered camera location and annotated camera orientation used to render segmentation images

Infinite 2d map synthesis

  • Generating 3D environments directly is currently not possible
  • We propose to start by synthesizing the corresponding 2D map
  • Leverage the infinite-pixel image synthesis ability of InfinityGAN
  • Generate categorical labels instead of real RGB satellite images
  • Model height map and surface normal vector to regularize structural plausibility
  • Apply contrastive patch discriminator to increase importance of fine-grained details
  • Synthesize tuples of images of arbitrary scale

Voxel world completion

  • Model ensures final voxel structure is watertight and maintains original voxel surfaces
  • Adopt PVD as a critical baseline
  • Measure distribution distance similar to FID using an autoencoder
  • Outperforms PVD in evaluation setting
  • Pillar method creates undesired appearances for certain object classes
  • Synthesizing structure from satellite view simplifies and benefits structure synthesis
  • Bilateral filtering improves plausibility of structure and suppresses noises

Texturization via neural rendering

  • Our method is the first attempt to generate infinite-scale 3D environments using 2D and 3D data.
  • We compare our method with GSN to illustrate the advantages of using 3D data.
  • Results show 3D consistency of the 3D structure.
  • GSN fails to learn the appearance of the city and its latent space fails to understand 3D information.
  • Quantitative evaluation shows InfiniCity substantially outperforms GSN.

Interactive sampling gui

  • Generative models have difficulty maintaining consistent quality over large images.
  • Artifacts can occur, such as bridges suddenly terminating in the middle of water.
  • An interactive sampling GUI is developed to give imperfect images a second chance.

Experiments

Dataset processing

  • InfiniCity algorithm extracts data modalities
  • HoliCity is a large-scale dataset based on 3D London CAD model
  • Dataset contains 50,024 images registered to CAD model
  • Subset of CAD model used to train and evaluate algorithm
  • Point sampling and voxel resolution used to partition space
  • Octree created with 64 voxels on each edge
  • Voxels scanned and projected onto 2D image

Infinite 2d map generation

  • Synthesized satellite images with method across categorical, depth, and normal modalities
  • User can interactively resample local latent variables within sub-region of map
  • Contrastive patch discriminator improves generator quality

Conclusion

  • Propose InfiniCity, a novel framework for unbounded 3D environment synthesis
  • Produces high-quality and high-diversity results
  • Plausible, traversable, easily editable structures at an infinite scale
  • Quality of final rendering is bounded by neural rendering
  • InfiniCity consists of three major modules
  • Interactive resampling allows users to select region of interest
  • Synthesized satellite maps with contrastive discriminator
  • Octree-based voxel completion
  • Voxel-based neural rendering
  • Trajectory-wise image rendering results show better quality, structural consistency, and diversity
  • Traversable and consistent 3D city rendering