Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Close-up facial images often have perspective distortion.
- Proposed method for correcting perspective distortion in a single close-up face.
- Method uses GAN inversion and joint optimization of camera parameters and face latent code.
- Method uses focal length reparametrization, optimization scheduling, and geometric regularization.
- Results show improved visual quality compared to previous approaches.
Paper Content
Introduction
- Millions of people take smartphone selfies every day
- Smartphones have high-quality cameras
- Selfies suffer from perspective distortion
- Perspective distortion makes faces look unnatural and asymmetric
- Existing methods aim to correct distortion using reconstruction-based and learning-based warping
- 3D GAN inversion proposed to correct distortion
- 3D GAN inversion estimates facial geometry and camera-to-face distance
- Optimization of parameters is ill-posed
- Three designs proposed to address problem
- Quantitative evaluation protocol established
Related work
- Selfie photos taken from close distances often exhibit perspective distortions
- People are bothered by distorted facial features
- Existing smartphones attempt to persuade people to take selfies from a longer distance
- Existing perspective distortion methods have difficulty handling severe distortions
- 3D face reconstruction from a single image is challenging
- Existing methods are limited to reconstructing only the face
- Prior works focus on normalizing head pose
- Conditional generative models learn a face-specific GAN to generate a target face pose
- 2D GAN inversion methods optimize the latent code for a single image
- 3D GAN inversion approaches optimize the face latent code and part of the camera parameters
- Jointly estimating face shape, camera-to-face distance, and focal length is challenging
Method
- Aim to manipulate camera-to-subject distance of single close-up face portrait
- Propose 3D GAN inversion to invert portrait to corresponding face latent code and camera parameters
- Adjust camera parameters according to user preference, especially camera-to-subject distance and focal length
- Develop workflow to warp and blend regions to compose full-frame image/video
Preliminary
- StyleGAN maps random samples from a normal distribution to an intermediate latent vector
- 3D GAN uses additional camera parameters and a neural render to generate the final image
- Training and inversion of 3D GANs require aligning and cropping the face
Perspective-aware 3d gan inversion
- 3D GAN with additional camera parameters can enable camera-controllable image generation
- Inversion process is complicated when using single-face image
- Problem is ill-posed, meaning multiple combinations of focal length, camera-to-subject distance, and face shape can match input image
- Existing 3D GAN inversions focus on far camera-to-subject distances
- Accurate estimation of both camera-to-subject distance and focal length is necessary for near-range camera-to-subject distances
- Focal length reparameterization, optimization scheduling, and landmark regularization proposed to ease ill-posedness and improve facial geometry and rendering results
- Start from close camera-to-subject distance to ease optimization
- Optimization of face and camera parameters is asynchronous
- Uncertainty-based landmark loss used to increase sensibility to camera-to-subject variation
Stitching
- 3D GAN inversion method can manipulate camera distance and focal length to render virtual images
- System developed to stitch reprojected face with original full image
- Algorithm aligns and blends depth from 3D GAN and depth estimated for full image
- Entire image projected to far distance using same camera parameter as 3D GAN
- Generator fine-tuned to make border of synthesis close to warped full image
- Refined synthetic far image and warped full image blended to produce complete image
Implementation details
- Learning rates set to 1x10-2, 5x10-3, and 3x10-4
- EG3D pretrained on FFHQ used in experiments
- Camera parameters initialized using Deng et al.
- MiDaS used to estimate monocular depth
- 3D Photo inpainting used to reproject background
- Stable Diffusion or DALLE2 used to inpaint background if severely damaged
Experiments
Experimental setup
- CMDP dataset contains portrait images of different people taken from various distances
- USC perspective portrait database contains images with single faces with different levels of perspective distortions
- In-the-wild images are used for visual comparisons
- Comparing proposed methods with two existing portrait perspective correction methods
- Four evaluation metrics used to evaluate performance of portrait perspective correction: Euclidean distance landmark error, PSNR, SSIM, and LPIPS
Quantitative evaluation
- Our proposed method performs well in the landmark metric on the CMDP Dataset
- Our implementation of [28] is close to the original version and performs better in the landmark metric and slightly worse in photometric metrics
Qualitative evaluation
- Our method generates faces with fewer perspective distortions and preserves identification.
- 3D GAN inversion is an effective way of portrait perspective correction compared to flow-based warping methods.
Ablation study
- Ablation studies conducted on CMDP dataset and seriously distorted face images
- Without proposed designs, optimization gets stuck in sub-optimal solution
- Focal length reparameterization and distance initialization are critical
- Optimization scheduling is important but not essential
- Stitching post-processing is necessary for seamless blending
Other applications
- Our method improves the editing ability of 3D GAN on perspective-distorted faces.
- Our method enables us to edit safely and correct distortion well for partially-occluded faces.
Failure modes
- Our method fails for out-of-distribution faces, like extreme expressions, occluded faces, and faces with high-frequency details.
- GAN inversion may generate a face in its own understanding, which can have awful artifacts.
- GAN inversion may ignore high-frequency details and output a smoothed-out face.
Conclusions
- Presents a method for portrait perspective distortion correction
- Leverages a 3D GAN inversion method to recover facial geometry
- Explores optimization scheduling, focal length reparameterization, and closeup camera-to-face distance initialization
- Establishes a protocol of quantitative evaluation
- Improved performance over existing methods
- Quantitative and visual comparisons demonstrate improved performance
- Editing ability improved with method
- Evaluated on Caltech Multi-Distance Portraits Dataset