Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

StyleGAN is limited to cropped aligned faces at a fixed image resolution.
Dilated convolutions can be used to rescale the receptive fields of shallow layers in StyleGAN.
This allows fixed-size small features at shallow layers to be extended into larger ones that can accommodate variable resolutions.
An encoder is introduced to enable real face inversion and manipulation.

Paper Content

StyleGAN inversion aims to project real face images into the latent space of StyleGAN
StyleGAN-based face manipulation can be done by optimizing the latent code or searching for offline editing vectors
Image-to-image translation frameworks can also be used to achieve face manipulation
StyleGANEX can manipulate mid-layer features with various resolutions and supports diverse manipulation tasks

Styleganex

StyleGAN has a fixed-crop limitation
StyleGAN2 is the focus of most existing StyleGAN-based manipulation models

Analysis of the fixed-crop limitation

StyleGAN has potential for handling normal FoV face images
Generator of StyleGAN is a fully convolutional architecture that can handle different feature resolutions
Translation equivariance of convolution operations supports feature translation
Limiting factor is first-layer feature with fixed resolution of 4x4
Sub-pixel translation fuses adjacent feature values, resulting in blurry face
First-layer feature inadequate to characterize spatial information of unaligned faces
Style-GANEX expands shallow layers to have same resolution as 7th layer, enabling face manipulation beyond cropped and aligned faces

Face manipulation with styleganex

Styleganex encoder

StyleGANEX encoder E is used to project real face images into the W + -F space of StyleGANEX G
E builds upon the pSp encoder
For F space, a convolution layer is added to map concatenated features to the first-layer input feature f of G
For W + space, the original pSp encoder takes a 256 x 256 image as input and convolves it to eighteen 1 x 1 x 512 features
Global average pooling is added to resize all features to 1 x 1 x 512 before mapping to latent codes
Scalar parameter indicates shallow layers of G receive encoder features

Styleganex inversion and editing

We perform a two-step StyleGANEX inversion to find appropriate f and ŵ+ that precisely reconstruct a target image x.
Step I projects x to initial f and w + with E. Step II optimizes f and w + to reduce the reconstruction error.
We use x instead of x to predict more accurate w +.
We measure the distance between the reconstructed x and the target x in terms of pixel similarity, perceptual similarity, and identity preservation.
We can perform flexible editing over x as in StyleGAN.

Styleganex-based translation

End-to-end image-to-image translation framework
Can be trained to do different face manipulation tasks
Face super-resolution: train encoder to recover high-resolution image from low-resolution image
Sketch/mask-to-face translation: use trainable light-weight translation network to map source to intermediate domain
Video face editing: train encoder to edit face with editing vector
Video toonification: use StyleGAN fine-tuned on cartoon images

Experimental results

Set λ 2 , λ 3 , λ 4 , λ 5 , λ 6 for tasks
Translation network consists of two downsampling convolutional layers, two ResBlocks and two upsampling convolutional layers
Experiments performed on single NVIDIA Tesla V100 GPU
Training data from FFHQ, augmented with random geometric transformations
Testing data from FaceForensics++, Unsplash and Pexels

Face manipulation

Face editing works well on StyleGANEX
Compared with other baselines, our approach processes the entire image as a whole and avoids discontinuity issues
Our method successfully edits facial attributes and styles
32x super resolution results show both face and non-face regions are reasonably restored
Our method surpasses other methods in precise detail restoration and uniform super-resolution
Our method successfully translates whole images and achieves realism and structural consistency to the inputs
Our method receives the best score in a user study
Our method preserves more details of the non-face region and generates sharper faces than VToonify-T

Ablation study

Step II of two-step inversion verified in Fig. 6
Step I studied in Fig. 14
500-iteration optimization for precise reconstruction
Domain transfer to Disney Princess
Poor result with 2,000 iterations if directly optimize mean w+ and random f
Input choice studied in Fig. 15
Cropped aligned faces default choice
Reasonable results with cropped input to decrease background proportion
Skip connection studied in Fig. 16
Smaller skip connection to enhance model robustness to low-quality inputs

Limitations

Relies on inefficient optimization process for precise reconstruction
Focuses on overcoming fixed-crop limitation of StyleGAN, not GAN inversion
Limited by feature representation of StyleGAN
May not handle out-of-distribution features
Focuses on face manipulation, not non-facial regions
May inherit model bias of StyleGAN

Conclusion

Presented an approach to refactor StyleGAN to overcome fixed-crop limitation while retaining style control abilities
Refactored model called StyleGANEX fully inherits parameters of pre-trained StyleGAN without retraining
Introduced StyleGANEX encoder to project normal FoV face images to joint W + -F space of StyleGANEX for real face inversion and manipulation
Training encoder uses one NVIDIA Tesla V100 GPU for 100,000 iterations
Training time is about 2 days for 100,000 iterations
Image inference uses one NVIDIA Tesla V100 GPU and a batch size of 1
Inference on 796 testing images takes about 107.11 s
Normal FoV face super-resolution results show detail restoration and uniform super-resolution without discontinuity between face and non-face regions
Sketch/mask-to-face translation results show realism and structural consistency to the inputs
StyleGANEX inversion and facial attribute/style editing results show full background with target style
Comparison results show better scores for StyleGANEX encoder and full two-step inversion

Link to paper#

Abstract#

Paper Content#

Related work#

Styleganex#

Analysis of the fixed-crop limitation#

Face manipulation with styleganex#

Styleganex encoder#

Styleganex inversion and editing#

Styleganex-based translation#

Experimental results#

Face manipulation#

Ablation study#

Limitations#

Conclusion#

Link to paper

Abstract

Paper Content

Related work

Styleganex

Analysis of the fixed-crop limitation

Face manipulation with styleganex

Styleganex encoder

Styleganex inversion and editing

Styleganex-based translation

Experimental results

Face manipulation

Ablation study

Limitations

Conclusion