Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.


  • Introducing 3DBiCar, a large-scale dataset of 3D biped cartoon characters
  • Introducing RaBit, a parametric model built upon 3DBiCar
  • Applications of 3DBiCar and RaBit include single-view reconstruction, sketch-based modeling, and 3D cartoon animation
  • Part-sensitive texture reasoner used to make local areas more detailed

Paper Content


  • Rapid development of digitization leads to demand for high-quality 3D articulated characters
  • Creating 3D characters is labor-intensive and time-consuming
  • 3D sensing devices make capturing 3D data from the real world convenient
  • Parametric models and deep learning techniques can infer accurate 3D digital humans from single-view images or sparse sketches
  • Introduction of 3DBiCar, a large-scale 3D biped cartoon character dataset with 1,500 high-quality 3D models
  • RaBit, a generative model for 3D biped cartoon character generation
  • BiCar-Net, a baseline method for single-view reconstruction
  • Applications of sketch-based modeling and 3D character animation
  • 3D character datasets can be categorized as real-captured and computer-designed
  • Real-captured datasets focus on human faces and bodies
  • FaceWarehouse and FaceScape collect 3D faces with high diversity
  • CAE-SAR dataset is widely used for body shape modeling
  • Computer-designed datasets lack diversity and are unsatisfactory
  • 3DCaricShop and SimpModeling help with unreal character heads
  • 3DBiCar is a large 3D biped cartoon character dataset
  • Parametric shape modeling uses PCA and 3DMM
  • Parametric texture modeling uses deep neural networks
  • GANFIT, StylePeople, and GET3D use neural texture synthesis


  • Digitizing realistic and articulated human characters has made progress, but creating visually plausible biped cartoon characters is still difficult.
  • 3DBiCar is a large-scale full-body 3D biped character data set.
  • 3DBiCar contains 1,500 high-quality 3D models with diverse identities, shapes, and textural styles.
  • 3DBiCar has a unified mesh topology and provides various forms of data for each character.

Parametric modeling

  • Proposed first parametric model of 3D biped cartoon characters (RaBit)
  • Model contains linear blend model for shapes and neural generator for textures
  • Parametric space decomposed into identity-related body parameter B, non-rigid pose-related parameter ฮ˜, and texture-related parameter T

Shape modeling

  • Linear shape models are used to represent 3D models.
  • PCA has been used to model the human body and face.
  • An equation is used to parameterize character shape linearly.
  • PCA is used to learn the shape model of RaBit from 1,050 characters.
  • Eyeballs can be computed based on predefined landmarks.

Pose modeling

  • RaBit uses a vertex-based linear blend skinning technique.
  • Pose parameter ฮ˜ is a set of angles.
  • Pose function F P changes vertex from rest pose to posed mesh.
  • G k (ฮ˜, J) is the global transformation of joint k.
  • A(k) is a set of all ancestors of joint k.
  • J j is the location of the j-th joint.

Texture modeling

  • Traditional linear PCA can build a decent statistical shape model, but cannot represent high-frequency details in textures.
  • GAN-based architectures have shown the capability of generating high-fidelity images.
  • StyleGAN2-based techniques are used to generate UV texture maps with a coherent UV unfolding.
  • Neural texture generator translates a latent code to a texture map.
  • Textured mesh is generated by applying the texture map to the mesh model.

Single-view reconstruction

  • Single-view reconstruction is a popular task for 3D content generation
  • Bi-CarNet is a baseline learning-based method for reconstructing 3D shape, pose, and texture from a single masked image of cartoon characters
  • PSR is used to address the issue of losing detailed appearances of small areas
  • Five individual UV-mappings are designed for significant parts of the cartoon character
  • Fuser is used to address blending artifacts


  • Split 3DBiCar into training and testing set
  • Generate a large number of synthetic paired data
  • 13,650 pairs for training
  • BiCarNet takes an image with foreground masked as input
  • BiCarNet can generate vivid 3D cartoon characters
  • HMR-like blocks and RaBit for shape and pose learning
  • Compare 3 methods for shape reconstruction
  • Compare GAN-based texture generator with PCA-based inference
  • Ablative analysis on BiCarNet without Fuser and Part-sensitive Reasoner

Sketch-based modeling

  • Customizing 3D biped cartoon characters usually requires a lot of work with commercial tools.
  • Sketch-based modeling allows amateur users to customize 3D shapes in a simple and intuitive way.
  • 12,000 T-pose models were generated by sampling shape vectors and using RaBit.
  • 108,000 sketch-model pairs were created using suggestive contour.
  • ResNet-50 and MLPs were used to map input sketches to 100-dimensional shape parameters.
  • Output characters are animation-ready and can be used with other commercial tools.
  • Fig. 10 shows sketches created by users and corresponding models generated by the system.

3d character animation

  • Extract human from video frames and use temporal-aware encoder to recover sequence of poses
  • Use motion retargeting to convert poses to motion of cartoon characters
  • Animation-ready characters generated by RaBit can be used for 3D animation


  • 3DBiCar is the first large-scale 3D biped cartoon character dataset
  • It contains 1,500 textured and skinned models with a consistent mesh topology
  • RaBit is the first 3D full-body cartoon parametric model
  • BiCarNet is a baseline method for reconstructing 3D textured models from a single image with cartoon characters
  • Experimental results demonstrate the capability of 3DBiCar and RaBit as well as the effectiveness of BiCarNet
  • Two applications, i.e., sketch-based modeling and 3D character animation, demonstrate the usability and practicality of the dataset and parametric model
  • 4 image styles are defined based on their different sources: picture book, computer designed, hand drawn, and toy
  • Shape model is learned from 1,050 models of 3DBiCar using PCA
  • Pose modeling utilizes the consistent skeleton and skinning weight matrix defined in 3DBiCar
  • Texture generator follows the architecture of StyleGAN2
  • 3DBiCar is split into a training set (1,050 image-model pairs) and a testing set (450 pairs)
  • Synthetic paired data is augmented with the help of RaBit
  • Eyeball is approximated as a sphere
  • Sketch-based modeling interface is implemented with the QT framework
  • 12,000 shape vectors are randomly sampled and fed to RaBit to generate 3D cartoon characters
  • Suggestive contour is applied to render the front-view sketches with different abstraction levels
  • ResNet-50 module and three MLPs are used as the encoder-decoder architecture
  • pSp-encoder is used to learn a 512-dimensional texture vector from the image
  • pSp is used as the basic building block to learn multiple local UV textures
  • pix2pixHD is used as the fusion module (Fuser)