Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Standard diffusion models involve an image transform and an image restoration operator.
  • A family of generative models can be constructed by varying the choice of image degradation.
  • Diffusion models can be generalized to create generative models even when using deterministic degradations.
  • Code is available at a given URL.

Paper Content

Introduction

  • Diffusion models are powerful tools for generative modeling
  • Diffusion models are based on random noise removal
  • Diffusion models are understood as a random walk around the image density function
  • Variational inference with a Gaussian prior is used to derive the loss for the denoising network
  • Examining the need for Gaussian noise or any randomness for diffusion models to work
  • Considering models built around arbitrary image transformations
  • Generative behavior emerges when a sequence of updates is applied at test time
  • Cold diffusions require no Gaussian noise or any randomness during training or testing

Background

  • Generative models exist for natural language and images
  • GANs have been used for image synthesis
  • Diffusion models have become competitive for some applications
  • Noise is used in training and sampling pipelines
  • Noise is thought to expand the support of the low-dimensional training distribution
  • Noise is also thought to act as data augmentation
  • Iterative neural models have been used for inverse problems
  • Diffusion models have been applied to inverse problems
  • Noise is not a necessity in diffusion models
  • Feature space similarity metrics have been proposed to measure how closely generative models approximate the real training data

Generalized diffusion

  • Diffusion models have two components: an image degradation operator and a trained restoration operator.
  • Diffusion models alternate between applying these two operators.
  • This paper looks at generalized diffusions built around arbitrary degradation operations, which can be randomized or deterministic.

Model components and training

  • Image x0 is degraded by operator D with severity t
  • Output of degradation should vary continuously in t
  • Operator D can perform various transformations such as blurring, masking out pixels, downsampling
  • Restoration operator R inverts D and approximates x0
  • Restoration operator is implemented via a neural network parameterized by θ
  • Restoration network is trained via minimization problem

Sampling from the model

  • Choosing a degradation and training a model to perform restoration
  • Standard methods borrowed from diffusion literature can be used to invert severe degradations
  • For small degradations, a single application of the model can be used to obtain a restored image
  • Model is typically trained using a simple convex loss, which yields blurry results for large degradations
  • Diffusion models perform generation by iteratively applying the denoising operator and adding noise back to the image
  • Algorithms 1 and 2 can perfectly reconstruct the iterate if the restoration operator is a perfect inverse for the degradation operator
  • Analyzing the stability of these algorithms to errors in the restoration operator

Properties of algorithm 2

  • Algorithm 2 is tolerant of errors in the restoration operator R for small values of x and s.
  • The degradation function D(x, s) can be approximated by a linear function x + s • e.
  • Algorithm 2 produces the value x s = D(x 0 , s) for all s < t, regardless of the choice of R.
  • Algorithm 1 does not have this behavior and can incur errors for small values of s.

Generalized diffusions with various transformations

  • Reversing degradations and performing conditional generation
  • Extending methods to perform unconditional generation
  • Empirically evaluating generalized diffusion models
  • Experiments on vision tasks of deblurring, inpainting, super-resolution, and synthetic snow removal
  • Experiments on MNIST, CIFAR-10, and CelebA
  • Qualitative and quantitative results on held-out testing dataset
  • Measuring Frechet inception distance (FID) scores for degraded and reconstructed images

Deblurring

  • A generalized diffusion based on a Gaussian blur operation is considered.
  • A deblurring model is trained by minimizing a loss.
  • Algorithm 2 is used to invert the blurred diffusion process.
  • Qualitative and quantitative results are shown in Figure 3 and Table 1.
  • Sampling process brings the learned distribution closer to the true data manifold.
  • Sampling process adds frequencies that were removed during the degradation process.

Inpainting

  • Define a schedule of transforms to progressively grays-out pixels from an input image
  • Use a 2D Gaussian curve of variance β, discretized into an n x n array
  • Randomize the location of the Gaussian mask for MNIST and CIFAR-10, but keep it centered for CelebA
  • Control the amount of information removed at each step by tuning the β i parameter
  • Compare output of inpainting model to original image
  • Quantitatively assess effectiveness of inpainting models by comparing distributional similarity metrics before and after reconstruction

Super-resolution

  • Image is down-sampled by a factor of two in each direction
  • Final resolution is 4x4 for MNIST and CIFAR-10, 2x2 for Celeb-A
  • After each down-sampling, image is resized to original size using nearest-neighbor interpolation

Snowification

  • The purpose of the experiment is to show that generalized diffusion can work with transforms that don’t have scale-space and compositional properties.
  • The images were degraded by adding snow, with the level of snow increasing with each step.
  • The desnowified images had near-perfect reconstruction results for CIFAR-10 examples with lighter snow, and visually distinctive restoration for Celeb-A examples with heavy snow.

Cold generation

  • Diffusion models can learn underlying data distribution and generate high quality images
  • Deterministic generation using Gaussian noise discussed
  • Unconditional generation using deblurring discussed
  • Algorithm 2 can be extended to other degradations

Generation using deterministic noise degradation

  • Image generation using noise-based degradation is discussed
  • Two ways of applying Algorithm 2 with fixed noise are studied
  • Results for CelebA and AFHQ datasets using the fixed noise method and the estimated noise method are presented in Table 5

Image generation using blur

  • Forward diffusion process in noise-based diffusion models produces an isotropic Gaussian
  • Sampling from the isotropic Gaussian and denoising produces a new image
  • Blurred images form a simple distribution that can be modeled with a Gaussian mixture model
  • Every image degenerates to a constant value for large T
  • Constant value is the channel-wise mean of the RGB image
  • Gaussian mixture model is used to sample the channel-wise mean
  • Blurring schedule used with exponential rate of 0.01
  • Adding small amount of Gaussian noise improves quality of generated images

Generation using other transformations

  • Generation can be extended to other transformations, such as inpainting, super-resolution, and animorphosis.
  • A Gaussian mask is used to modify the masking routine so the final degraded image is completely devoid of information.
  • For super-resolution, the routine down-samples to a resolution of 2 × 2, or 4 values in each channel.
  • Animorphosis is a transformation where a human face is iteratively transformed to an animal face.

Conclusion

  • Existing diffusion models rely on Gaussian noise
  • This work replaces Gaussian noise with arbitrary transforms
  • Restores images afflicted by deterministic degradations such as blur, inpainting and downsampling
  • Framework paves the way for a more diverse landscape of diffusion models
  • Adam optimizer used with learning rate 2x10^-5
  • MNIST blurred recursively 40 times with 11x11 Gaussian kernel and standard deviation 7
  • CIFAR-10 blurred recursively with 11x11 Gaussian kernel and standard deviation 0.01*t+0.35
  • CelebA blurred with 15x15 Gaussian kernel and standard deviation 0.01*t
  • Inpainting models trained on different datasets with 60,000 gradient steps
  • Super-resolution model trained on different datasets for 700,000 iterations
  • Colorization task involves iteratively desaturating until fully gray-scale image
  • Snowification task involves linearly interpolating c0 and c1 between c start 0 and c end 0 and c start 1 and c end 1
  • Algorithm 2 produces higher quality images than other methods
  • Figure 21 shows 400 random images to demonstrate qualitative results