Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Standard diffusion models involve an image transform and an image restoration operator.
A family of generative models can be constructed by varying the choice of image degradation.
Diffusion models can be generalized to create generative models even when using deterministic degradations.
Code is available at a given URL.

Paper Content

Introduction

Diffusion models are powerful tools for generative modeling
Diffusion models are based on random noise removal
Diffusion models are understood as a random walk around the image density function
Variational inference with a Gaussian prior is used to derive the loss for the denoising network
Examining the need for Gaussian noise or any randomness for diffusion models to work
Considering models built around arbitrary image transformations
Generative behavior emerges when a sequence of updates is applied at test time
Cold diffusions require no Gaussian noise or any randomness during training or testing

Background

Generative models exist for natural language and images
GANs have been used for image synthesis
Diffusion models have become competitive for some applications
Noise is used in training and sampling pipelines
Noise is thought to expand the support of the low-dimensional training distribution
Noise is also thought to act as data augmentation
Iterative neural models have been used for inverse problems
Diffusion models have been applied to inverse problems
Noise is not a necessity in diffusion models
Feature space similarity metrics have been proposed to measure how closely generative models approximate the real training data

Generalized diffusion

Diffusion models have two components: an image degradation operator and a trained restoration operator.
Diffusion models alternate between applying these two operators.
This paper looks at generalized diffusions built around arbitrary degradation operations, which can be randomized or deterministic.

Model components and training

Image x0 is degraded by operator D with severity t
Output of degradation should vary continuously in t
Operator D can perform various transformations such as blurring, masking out pixels, downsampling
Restoration operator R inverts D and approximates x0
Restoration operator is implemented via a neural network parameterized by θ
Restoration network is trained via minimization problem

Sampling from the model

Choosing a degradation and training a model to perform restoration
Standard methods borrowed from diffusion literature can be used to invert severe degradations
For small degradations, a single application of the model can be used to obtain a restored image
Model is typically trained using a simple convex loss, which yields blurry results for large degradations
Diffusion models perform generation by iteratively applying the denoising operator and adding noise back to the image
Algorithms 1 and 2 can perfectly reconstruct the iterate if the restoration operator is a perfect inverse for the degradation operator
Analyzing the stability of these algorithms to errors in the restoration operator

Properties of algorithm 2

Algorithm 2 is tolerant of errors in the restoration operator R for small values of x and s.
The degradation function D(x, s) can be approximated by a linear function x + s • e.
Algorithm 2 produces the value x s = D(x 0 , s) for all s < t, regardless of the choice of R.
Algorithm 1 does not have this behavior and can incur errors for small values of s.

Generalized diffusions with various transformations

Reversing degradations and performing conditional generation
Extending methods to perform unconditional generation
Empirically evaluating generalized diffusion models
Experiments on vision tasks of deblurring, inpainting, super-resolution, and synthetic snow removal
Experiments on MNIST, CIFAR-10, and CelebA
Qualitative and quantitative results on held-out testing dataset
Measuring Frechet inception distance (FID) scores for degraded and reconstructed images

Deblurring

A generalized diffusion based on a Gaussian blur operation is considered.
A deblurring model is trained by minimizing a loss.
Algorithm 2 is used to invert the blurred diffusion process.
Qualitative and quantitative results are shown in Figure 3 and Table 1.
Sampling process brings the learned distribution closer to the true data manifold.
Sampling process adds frequencies that were removed during the degradation process.

Inpainting

Define a schedule of transforms to progressively grays-out pixels from an input image
Use a 2D Gaussian curve of variance β, discretized into an n x n array
Randomize the location of the Gaussian mask for MNIST and CIFAR-10, but keep it centered for CelebA
Control the amount of information removed at each step by tuning the β i parameter
Compare output of inpainting model to original image
Quantitatively assess effectiveness of inpainting models by comparing distributional similarity metrics before and after reconstruction

Super-resolution

Image is down-sampled by a factor of two in each direction
Final resolution is 4x4 for MNIST and CIFAR-10, 2x2 for Celeb-A
After each down-sampling, image is resized to original size using nearest-neighbor interpolation

Snowification

The purpose of the experiment is to show that generalized diffusion can work with transforms that don’t have scale-space and compositional properties.
The images were degraded by adding snow, with the level of snow increasing with each step.
The desnowified images had near-perfect reconstruction results for CIFAR-10 examples with lighter snow, and visually distinctive restoration for Celeb-A examples with heavy snow.

Cold generation

Diffusion models can learn underlying data distribution and generate high quality images
Deterministic generation using Gaussian noise discussed
Unconditional generation using deblurring discussed
Algorithm 2 can be extended to other degradations

Generation using deterministic noise degradation

Image generation using noise-based degradation is discussed
Two ways of applying Algorithm 2 with fixed noise are studied
Results for CelebA and AFHQ datasets using the fixed noise method and the estimated noise method are presented in Table 5

Image generation using blur

Forward diffusion process in noise-based diffusion models produces an isotropic Gaussian
Sampling from the isotropic Gaussian and denoising produces a new image
Blurred images form a simple distribution that can be modeled with a Gaussian mixture model
Every image degenerates to a constant value for large T
Constant value is the channel-wise mean of the RGB image
Gaussian mixture model is used to sample the channel-wise mean
Blurring schedule used with exponential rate of 0.01
Adding small amount of Gaussian noise improves quality of generated images

Generation using other transformations

Generation can be extended to other transformations, such as inpainting, super-resolution, and animorphosis.
A Gaussian mask is used to modify the masking routine so the final degraded image is completely devoid of information.
For super-resolution, the routine down-samples to a resolution of 2 × 2, or 4 values in each channel.
Animorphosis is a transformation where a human face is iteratively transformed to an animal face.

Conclusion

Existing diffusion models rely on Gaussian noise
This work replaces Gaussian noise with arbitrary transforms
Restores images afflicted by deterministic degradations such as blur, inpainting and downsampling
Framework paves the way for a more diverse landscape of diffusion models
Adam optimizer used with learning rate 2x10^-5
MNIST blurred recursively 40 times with 11x11 Gaussian kernel and standard deviation 7
CIFAR-10 blurred recursively with 11x11 Gaussian kernel and standard deviation 0.01*t+0.35
CelebA blurred with 15x15 Gaussian kernel and standard deviation 0.01*t
Inpainting models trained on different datasets with 60,000 gradient steps
Super-resolution model trained on different datasets for 700,000 iterations
Colorization task involves iteratively desaturating until fully gray-scale image
Snowification task involves linearly interpolating c0 and c1 between c start 0 and c end 0 and c start 1 and c end 1
Algorithm 2 produces higher quality images than other methods
Figure 21 shows 400 random images to demonstrate qualitative results

Link to paper#

Abstract#

Paper Content#

Introduction#

Background#

Generalized diffusion#

Model components and training#

Sampling from the model#

Properties of algorithm 2#

Generalized diffusions with various transformations#

Deblurring#

Inpainting#

Super-resolution#

Snowification#

Cold generation#

Generation using deterministic noise degradation#

Image generation using blur#

Generation using other transformations#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Background

Generalized diffusion

Model components and training

Sampling from the model

Properties of algorithm 2

Generalized diffusions with various transformations

Deblurring

Inpainting

Super-resolution

Snowification

Cold generation

Generation using deterministic noise degradation

Image generation using blur

Generation using other transformations

Conclusion