Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Close-up facial images often have perspective distortion.
  • Proposed method for correcting perspective distortion in a single close-up face.
  • Method uses GAN inversion and joint optimization of camera parameters and face latent code.
  • Method uses focal length reparametrization, optimization scheduling, and geometric regularization.
  • Results show improved visual quality compared to previous approaches.

Paper Content

Introduction

  • Millions of people take smartphone selfies every day
  • Smartphones have high-quality cameras
  • Selfies suffer from perspective distortion
  • Perspective distortion makes faces look unnatural and asymmetric
  • Existing methods aim to correct distortion using reconstruction-based and learning-based warping
  • 3D GAN inversion proposed to correct distortion
  • 3D GAN inversion estimates facial geometry and camera-to-face distance
  • Optimization of parameters is ill-posed
  • Three designs proposed to address problem
  • Quantitative evaluation protocol established
  • Selfie photos taken from close distances often exhibit perspective distortions
  • People are bothered by distorted facial features
  • Existing smartphones attempt to persuade people to take selfies from a longer distance
  • Existing perspective distortion methods have difficulty handling severe distortions
  • 3D face reconstruction from a single image is challenging
  • Existing methods are limited to reconstructing only the face
  • Prior works focus on normalizing head pose
  • Conditional generative models learn a face-specific GAN to generate a target face pose
  • 2D GAN inversion methods optimize the latent code for a single image
  • 3D GAN inversion approaches optimize the face latent code and part of the camera parameters
  • Jointly estimating face shape, camera-to-face distance, and focal length is challenging

Method

  • Aim to manipulate camera-to-subject distance of single close-up face portrait
  • Propose 3D GAN inversion to invert portrait to corresponding face latent code and camera parameters
  • Adjust camera parameters according to user preference, especially camera-to-subject distance and focal length
  • Develop workflow to warp and blend regions to compose full-frame image/video

Preliminary

  • StyleGAN maps random samples from a normal distribution to an intermediate latent vector
  • 3D GAN uses additional camera parameters and a neural render to generate the final image
  • Training and inversion of 3D GANs require aligning and cropping the face

Perspective-aware 3d gan inversion

  • 3D GAN with additional camera parameters can enable camera-controllable image generation
  • Inversion process is complicated when using single-face image
  • Problem is ill-posed, meaning multiple combinations of focal length, camera-to-subject distance, and face shape can match input image
  • Existing 3D GAN inversions focus on far camera-to-subject distances
  • Accurate estimation of both camera-to-subject distance and focal length is necessary for near-range camera-to-subject distances
  • Focal length reparameterization, optimization scheduling, and landmark regularization proposed to ease ill-posedness and improve facial geometry and rendering results
  • Start from close camera-to-subject distance to ease optimization
  • Optimization of face and camera parameters is asynchronous
  • Uncertainty-based landmark loss used to increase sensibility to camera-to-subject variation

Stitching

  • 3D GAN inversion method can manipulate camera distance and focal length to render virtual images
  • System developed to stitch reprojected face with original full image
  • Algorithm aligns and blends depth from 3D GAN and depth estimated for full image
  • Entire image projected to far distance using same camera parameter as 3D GAN
  • Generator fine-tuned to make border of synthesis close to warped full image
  • Refined synthetic far image and warped full image blended to produce complete image

Implementation details

  • Learning rates set to 1x10-2, 5x10-3, and 3x10-4
  • EG3D pretrained on FFHQ used in experiments
  • Camera parameters initialized using Deng et al.
  • MiDaS used to estimate monocular depth
  • 3D Photo inpainting used to reproject background
  • Stable Diffusion or DALLE2 used to inpaint background if severely damaged

Experiments

Experimental setup

  • CMDP dataset contains portrait images of different people taken from various distances
  • USC perspective portrait database contains images with single faces with different levels of perspective distortions
  • In-the-wild images are used for visual comparisons
  • Comparing proposed methods with two existing portrait perspective correction methods
  • Four evaluation metrics used to evaluate performance of portrait perspective correction: Euclidean distance landmark error, PSNR, SSIM, and LPIPS

Quantitative evaluation

  • Our proposed method performs well in the landmark metric on the CMDP Dataset
  • Our implementation of [28] is close to the original version and performs better in the landmark metric and slightly worse in photometric metrics

Qualitative evaluation

  • Our method generates faces with fewer perspective distortions and preserves identification.
  • 3D GAN inversion is an effective way of portrait perspective correction compared to flow-based warping methods.

Ablation study

  • Ablation studies conducted on CMDP dataset and seriously distorted face images
  • Without proposed designs, optimization gets stuck in sub-optimal solution
  • Focal length reparameterization and distance initialization are critical
  • Optimization scheduling is important but not essential
  • Stitching post-processing is necessary for seamless blending

Other applications

  • Our method improves the editing ability of 3D GAN on perspective-distorted faces.
  • Our method enables us to edit safely and correct distortion well for partially-occluded faces.

Failure modes

  • Our method fails for out-of-distribution faces, like extreme expressions, occluded faces, and faces with high-frequency details.
  • GAN inversion may generate a face in its own understanding, which can have awful artifacts.
  • GAN inversion may ignore high-frequency details and output a smoothed-out face.

Conclusions

  • Presents a method for portrait perspective distortion correction
  • Leverages a 3D GAN inversion method to recover facial geometry
  • Explores optimization scheduling, focal length reparameterization, and closeup camera-to-face distance initialization
  • Establishes a protocol of quantitative evaluation
  • Improved performance over existing methods
  • Quantitative and visual comparisons demonstrate improved performance
  • Editing ability improved with method
  • Evaluated on Caltech Multi-Distance Portraits Dataset