Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Industries are moving towards modeling massive 3D virtual worlds
- Need for content creation tools that can scale in terms of quantity, quality, and diversity of 3D content
- Aim to train performant 3D generative models that synthesize textured meshes
- Prior works lack geometric details, limited in mesh topology, don’t support textures, or use neural renderers
- Introduce GET3D, a Generative model that directly generates Explicit Textured 3D meshes with complex topology, rich geometric details, and high-fidelity textures
- GET3D able to generate high-quality 3D textured meshes
Paper Content
Introduction
- 3D content is important for many industries
- Manual creation of 3D assets is time-consuming and requires technical and artistic skills
- Creating many 3D models is difficult
- Generative 3D networks can produce high-quality and diverse 3D assets
- Requirements for 3D generative models: detailed geometry, arbitrary topology, textured mesh, 2D image supervision
- Prior work has focused on subsets of the requirements
- GET3D is a novel approach that fulfills all requirements
- GET3D can generate high-quality geometric and texture details
- GET3D can be adapted to other tasks, such as material and lighting effects, text-guided 3D shape generation
Related work
- 3D generative models have been developed to generate photorealistic images
- 3D generative models focus mainly on generating geometry and disregard appearance
- GET3D is able to generate diverse shapes with arbitrary topology, high quality geometry, and texture
- 3D-aware generative image synthesis has been developed to tackle the problem of 3D-aware image synthesis
- GET3D directly outputs textured 3D meshes that can be readily used in standard graphics engines
Method
- GET3D framework synthesizes textured 3D shapes
- Generation process is split into two parts: geometry branch and texture branch
- Training uses efficient differentiable rasterizer to render textured mesh into 2D images
- Model is differentiable, allowing for adversarial training from images
- Generator and rendering/loss functions introduced in Sec 3.1 and 3.2
Generative model of 3d textured meshes
- Aim to learn a 3D generator to map a sample from a Gaussian distribution to a mesh with texture
- Sample two random input vectors
- Use non-linear mapping networks to map input vectors to intermediate latent vectors
- Utilize DMTet to extract 3D surface mesh from SDF
- Train with adversarial losses defined on 2D images
- Use a rasterization-based differentiable renderer to obtain RGB images and silhouettes
- Utilize two 2D discriminators to classify inputs as real or fake
- Model is end-to-end trainable
Geometry generator
- DMTet is a differentiable surface representation
- It represents geometry as a signed distance field (SDF) defined on a deformable tetrahedral grid
- Deforming the grid results in better resolution
- It allows for explicit meshes with arbitrary topology and genus
- SDF values and deformations are mapped to vertices using 3D convolutional and fully connected layers
- Differentiable marching tetrahedra is used to extract the explicit mesh
- Mesh topology is determined by the signs of SDF values
- Shapes with arbitrary topology can be generated by predicting different signs of SDF values
Texture generator
- Generating a texture map for an output mesh is difficult.
- Texture is modeled as a function that maps 3D location to RGB color.
- Texture field is conditioned on geometry latent code.
- Texture field is represented using a tri-plane representation.
- Feature vector of a surface point is recovered using projection and interpolation.
- Texture field is only sampled at surface points.
Differentiable rendering and training
- We draw inspiration from Nvdiffrec to supervise our model during training.
- We render the 3D mesh and texture field into 2D images using a differentiable renderer.
- We use a 2D discriminator to distinguish the image from a real object or rendered from the generated object.
- We use an adversarial objective with two separate discriminators for RGB images and silhouettes.
Experiments
- Conducted extensive experiments to evaluate model
- Compared quality of 3D textured meshes generated by GET3D to existing methods
- Ablated design choices in section
- Demonstrated flexibility of GET3D by adapting to downstream applications
Experiments on synthetic datasets
- Datasets used for evaluation on ShapeNet: Car, Chair, and Motorbike
- Training, validation, and test sets randomly split from each category
- Camera poses randomly sampled from the upper hemisphere of each shape
- Animal dataset from TurboSquid used for textures
- House dataset from TurboSquid and Human Body dataset from Renderpeople used for qualitative results
- Baselines: PointFlow, OccNet, GRAF, PiGAN, and EG3D
- Metrics used to evaluate quality of synthesis: Chamfer Distance, Light Field Distance, Coverage score, Minimum Matching Distance, FID
- Quantitative and qualitative results provided
- Ablation study conducted
- Text-guided 3D content synthesis supported
- Limitations of GET3D: relies on 2D silhouettes and camera distribution during training, trained per-category
- Broader Impact: potential misuse or harmful applications, need to de-bias datasets