Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Close-up facial images often have perspective distortion.
Proposed method for correcting perspective distortion in a single close-up face.
Method uses GAN inversion and joint optimization of camera parameters and face latent code.
Method uses focal length reparametrization, optimization scheduling, and geometric regularization.
Results show improved visual quality compared to previous approaches.

Paper Content

Introduction

Millions of people take smartphone selfies every day
Smartphones have high-quality cameras
Selfies suffer from perspective distortion
Perspective distortion makes faces look unnatural and asymmetric
Existing methods aim to correct distortion using reconstruction-based and learning-based warping
3D GAN inversion proposed to correct distortion
3D GAN inversion estimates facial geometry and camera-to-face distance
Optimization of parameters is ill-posed
Three designs proposed to address problem
Quantitative evaluation protocol established

Selfie photos taken from close distances often exhibit perspective distortions
People are bothered by distorted facial features
Existing smartphones attempt to persuade people to take selfies from a longer distance
Existing perspective distortion methods have difficulty handling severe distortions
3D face reconstruction from a single image is challenging
Existing methods are limited to reconstructing only the face
Prior works focus on normalizing head pose
Conditional generative models learn a face-specific GAN to generate a target face pose
2D GAN inversion methods optimize the latent code for a single image
3D GAN inversion approaches optimize the face latent code and part of the camera parameters
Jointly estimating face shape, camera-to-face distance, and focal length is challenging

Method

Aim to manipulate camera-to-subject distance of single close-up face portrait
Propose 3D GAN inversion to invert portrait to corresponding face latent code and camera parameters
Adjust camera parameters according to user preference, especially camera-to-subject distance and focal length
Develop workflow to warp and blend regions to compose full-frame image/video

Preliminary

StyleGAN maps random samples from a normal distribution to an intermediate latent vector
3D GAN uses additional camera parameters and a neural render to generate the final image
Training and inversion of 3D GANs require aligning and cropping the face

Perspective-aware 3d gan inversion

3D GAN with additional camera parameters can enable camera-controllable image generation
Inversion process is complicated when using single-face image
Problem is ill-posed, meaning multiple combinations of focal length, camera-to-subject distance, and face shape can match input image
Existing 3D GAN inversions focus on far camera-to-subject distances
Accurate estimation of both camera-to-subject distance and focal length is necessary for near-range camera-to-subject distances
Focal length reparameterization, optimization scheduling, and landmark regularization proposed to ease ill-posedness and improve facial geometry and rendering results
Start from close camera-to-subject distance to ease optimization
Optimization of face and camera parameters is asynchronous
Uncertainty-based landmark loss used to increase sensibility to camera-to-subject variation

Stitching

3D GAN inversion method can manipulate camera distance and focal length to render virtual images
System developed to stitch reprojected face with original full image
Algorithm aligns and blends depth from 3D GAN and depth estimated for full image
Entire image projected to far distance using same camera parameter as 3D GAN
Generator fine-tuned to make border of synthesis close to warped full image
Refined synthetic far image and warped full image blended to produce complete image

Implementation details

Learning rates set to 1x10-2, 5x10-3, and 3x10-4
EG3D pretrained on FFHQ used in experiments
Camera parameters initialized using Deng et al.
MiDaS used to estimate monocular depth
3D Photo inpainting used to reproject background
Stable Diffusion or DALLE2 used to inpaint background if severely damaged

Experiments

Experimental setup

CMDP dataset contains portrait images of different people taken from various distances
USC perspective portrait database contains images with single faces with different levels of perspective distortions
In-the-wild images are used for visual comparisons
Comparing proposed methods with two existing portrait perspective correction methods
Four evaluation metrics used to evaluate performance of portrait perspective correction: Euclidean distance landmark error, PSNR, SSIM, and LPIPS

Quantitative evaluation

Our proposed method performs well in the landmark metric on the CMDP Dataset
Our implementation of [28] is close to the original version and performs better in the landmark metric and slightly worse in photometric metrics

Qualitative evaluation

Our method generates faces with fewer perspective distortions and preserves identification.
3D GAN inversion is an effective way of portrait perspective correction compared to flow-based warping methods.

Ablation study

Ablation studies conducted on CMDP dataset and seriously distorted face images
Without proposed designs, optimization gets stuck in sub-optimal solution
Focal length reparameterization and distance initialization are critical
Optimization scheduling is important but not essential
Stitching post-processing is necessary for seamless blending

Other applications

Our method improves the editing ability of 3D GAN on perspective-distorted faces.
Our method enables us to edit safely and correct distortion well for partially-occluded faces.

Failure modes

Our method fails for out-of-distribution faces, like extreme expressions, occluded faces, and faces with high-frequency details.
GAN inversion may generate a face in its own understanding, which can have awful artifacts.
GAN inversion may ignore high-frequency details and output a smoothed-out face.

Conclusions

Presents a method for portrait perspective distortion correction
Leverages a 3D GAN inversion method to recover facial geometry
Explores optimization scheduling, focal length reparameterization, and closeup camera-to-face distance initialization
Establishes a protocol of quantitative evaluation
Improved performance over existing methods
Quantitative and visual comparisons demonstrate improved performance
Editing ability improved with method
Evaluated on Caltech Multi-Distance Portraits Dataset

Link to paper#

Abstract#

Paper Content#

Introduction#

Related work#

Method#

Preliminary#

Perspective-aware 3d gan inversion#

Stitching#

Implementation details#

Experiments#

Experimental setup#

Quantitative evaluation#

Qualitative evaluation#

Ablation study#

Other applications#

Failure modes#

Conclusions#

Link to paper

Abstract

Paper Content

Introduction

Related work

Method

Preliminary

Perspective-aware 3d gan inversion

Stitching

Implementation details

Experiments

Experimental setup

Quantitative evaluation

Qualitative evaluation

Ablation study

Other applications

Failure modes

Conclusions