Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Industries are moving towards modeling massive 3D virtual worlds
  • Need for content creation tools that can scale in terms of quantity, quality, and diversity of 3D content
  • Aim to train performant 3D generative models that synthesize textured meshes
  • Prior works lack geometric details, limited in mesh topology, don’t support textures, or use neural renderers
  • Introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures
  • GET3D able to generate high-quality 3D textured meshes

Paper Content

Introduction

  • 3D content is important for many industries
  • Manual creation of 3D assets is time-consuming and requires technical and artistic skills
  • Creating many 3D models is difficult
  • Generative 3D networks can produce high-quality and diverse 3D assets
  • Requirements for 3D generative models: detailed geometry, arbitrary topology, textured mesh, 2D image supervision
  • Prior work has focused on subsets of the requirements
  • GET3D is a novel approach that fulfills all requirements
  • GET3D can generate high-quality geometric and texture details
  • GET3D can be adapted to other tasks, such as material and lighting effects, text-guided 3D shape generation
  • 3D generative models have been developed to generate photorealistic images
  • 3D generative models focus mainly on generating geometry and disregard appearance
  • GET3D is able to generate diverse shapes with arbitrary topology, high quality geometry, and texture
  • 3D-aware generative image synthesis has been developed to tackle the problem of 3D-aware image synthesis
  • GET3D directly outputs textured 3D meshes that can be readily used in standard graphics engines

Method

  • GET3D framework synthesizes textured 3D shapes
  • Generation process is split into two parts: geometry branch and texture branch
  • Training uses efficient differentiable rasterizer to render textured mesh into 2D images
  • Model is differentiable, allowing for adversarial training from images
  • Generator and rendering/loss functions introduced in Sec 3.1 and 3.2

Generative model of 3d textured meshes

  • Aim to learn a 3D generator to map a sample from a Gaussian distribution to a mesh with texture
  • Sample two random input vectors
  • Use non-linear mapping networks to map input vectors to intermediate latent vectors
  • Utilize DMTet to extract 3D surface mesh from SDF
  • Train with adversarial losses defined on 2D images
  • Use a rasterization-based differentiable renderer to obtain RGB images and silhouettes
  • Utilize two 2D discriminators to classify inputs as real or fake
  • Model is end-to-end trainable

Geometry generator

  • DMTet is a differentiable surface representation
  • It represents geometry as a signed distance field (SDF) defined on a deformable tetrahedral grid
  • Deforming the grid results in better resolution
  • It allows for explicit meshes with arbitrary topology and genus
  • SDF values and deformations are mapped to vertices using 3D convolutional and fully connected layers
  • Differentiable marching tetrahedra is used to extract the explicit mesh
  • Mesh topology is determined by the signs of SDF values
  • Shapes with arbitrary topology can be generated by predicting different signs of SDF values

Texture generator

  • Generating a texture map for an output mesh is difficult.
  • Texture is modeled as a function that maps 3D location to RGB color.
  • Texture field is conditioned on geometry latent code.
  • Texture field is represented using a tri-plane representation.
  • Feature vector of a surface point is recovered using projection and interpolation.
  • Texture field is only sampled at surface points.

Differentiable rendering and training

  • We draw inspiration from Nvdiffrec to supervise our model during training.
  • We render the 3D mesh and texture field into 2D images using a differentiable renderer.
  • We use a 2D discriminator to distinguish the image from a real object or rendered from the generated object.
  • We use an adversarial objective with two separate discriminators for RGB images and silhouettes.

Experiments

  • Conducted extensive experiments to evaluate model
  • Compared quality of 3D textured meshes generated by GET3D to existing methods
  • Ablated design choices in section
  • Demonstrated flexibility of GET3D by adapting to downstream applications

Experiments on synthetic datasets

  • Datasets used for evaluation on ShapeNet: Car, Chair, and Motorbike
  • Training, validation, and test sets randomly split from each category
  • Camera poses randomly sampled from the upper hemisphere of each shape
  • Animal dataset from TurboSquid used for textures
  • House dataset from TurboSquid and Human Body dataset from Renderpeople used for qualitative results
  • Baselines: PointFlow, OccNet, GRAF, PiGAN, and EG3D
  • Metrics used to evaluate quality of synthesis: Chamfer Distance, Light Field Distance, Coverage score, Minimum Matching Distance, FID
  • Quantitative and qualitative results provided
  • Ablation study conducted
  • Text-guided 3D content synthesis supported
  • Limitations of GET3D: relies on 2D silhouettes and camera distribution during training, trained per-category
  • Broader Impact: potential misuse or harmful applications, need to de-bias datasets