Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Standard diffusion models involve an image transform and an image restoration operator.
- A family of generative models can be constructed by varying the choice of image degradation.
- Diffusion models can be generalized to create generative models even when using deterministic degradations.
- Code is available at a given URL.
Paper Content
Introduction
- Diffusion models are powerful tools for generative modeling
- Diffusion models are based on random noise removal
- Diffusion models are understood as a random walk around the image density function
- Variational inference with a Gaussian prior is used to derive the loss for the denoising network
- Examining the need for Gaussian noise or any randomness for diffusion models to work
- Considering models built around arbitrary image transformations
- Generative behavior emerges when a sequence of updates is applied at test time
- Cold diffusions require no Gaussian noise or any randomness during training or testing
Background
- Generative models exist for natural language and images
- GANs have been used for image synthesis
- Diffusion models have become competitive for some applications
- Noise is used in training and sampling pipelines
- Noise is thought to expand the support of the low-dimensional training distribution
- Noise is also thought to act as data augmentation
- Iterative neural models have been used for inverse problems
- Diffusion models have been applied to inverse problems
- Noise is not a necessity in diffusion models
- Feature space similarity metrics have been proposed to measure how closely generative models approximate the real training data
Generalized diffusion
- Diffusion models have two components: an image degradation operator and a trained restoration operator.
- Diffusion models alternate between applying these two operators.
- This paper looks at generalized diffusions built around arbitrary degradation operations, which can be randomized or deterministic.
Model components and training
- Image x0 is degraded by operator D with severity t
- Output of degradation should vary continuously in t
- Operator D can perform various transformations such as blurring, masking out pixels, downsampling
- Restoration operator R inverts D and approximates x0
- Restoration operator is implemented via a neural network parameterized by θ
- Restoration network is trained via minimization problem
Sampling from the model
- Choosing a degradation and training a model to perform restoration
- Standard methods borrowed from diffusion literature can be used to invert severe degradations
- For small degradations, a single application of the model can be used to obtain a restored image
- Model is typically trained using a simple convex loss, which yields blurry results for large degradations
- Diffusion models perform generation by iteratively applying the denoising operator and adding noise back to the image
- Algorithms 1 and 2 can perfectly reconstruct the iterate if the restoration operator is a perfect inverse for the degradation operator
- Analyzing the stability of these algorithms to errors in the restoration operator
Properties of algorithm 2
- Algorithm 2 is tolerant of errors in the restoration operator R for small values of x and s.
- The degradation function D(x, s) can be approximated by a linear function x + s • e.
- Algorithm 2 produces the value x s = D(x 0 , s) for all s < t, regardless of the choice of R.
- Algorithm 1 does not have this behavior and can incur errors for small values of s.
Generalized diffusions with various transformations
- Reversing degradations and performing conditional generation
- Extending methods to perform unconditional generation
- Empirically evaluating generalized diffusion models
- Experiments on vision tasks of deblurring, inpainting, super-resolution, and synthetic snow removal
- Experiments on MNIST, CIFAR-10, and CelebA
- Qualitative and quantitative results on held-out testing dataset
- Measuring Frechet inception distance (FID) scores for degraded and reconstructed images
Deblurring
- A generalized diffusion based on a Gaussian blur operation is considered.
- A deblurring model is trained by minimizing a loss.
- Algorithm 2 is used to invert the blurred diffusion process.
- Qualitative and quantitative results are shown in Figure 3 and Table 1.
- Sampling process brings the learned distribution closer to the true data manifold.
- Sampling process adds frequencies that were removed during the degradation process.
Inpainting
- Define a schedule of transforms to progressively grays-out pixels from an input image
- Use a 2D Gaussian curve of variance β, discretized into an n x n array
- Randomize the location of the Gaussian mask for MNIST and CIFAR-10, but keep it centered for CelebA
- Control the amount of information removed at each step by tuning the β i parameter
- Compare output of inpainting model to original image
- Quantitatively assess effectiveness of inpainting models by comparing distributional similarity metrics before and after reconstruction
Super-resolution
- Image is down-sampled by a factor of two in each direction
- Final resolution is 4x4 for MNIST and CIFAR-10, 2x2 for Celeb-A
- After each down-sampling, image is resized to original size using nearest-neighbor interpolation
Snowification
- The purpose of the experiment is to show that generalized diffusion can work with transforms that don’t have scale-space and compositional properties.
- The images were degraded by adding snow, with the level of snow increasing with each step.
- The desnowified images had near-perfect reconstruction results for CIFAR-10 examples with lighter snow, and visually distinctive restoration for Celeb-A examples with heavy snow.
Cold generation
- Diffusion models can learn underlying data distribution and generate high quality images
- Deterministic generation using Gaussian noise discussed
- Unconditional generation using deblurring discussed
- Algorithm 2 can be extended to other degradations
Generation using deterministic noise degradation
- Image generation using noise-based degradation is discussed
- Two ways of applying Algorithm 2 with fixed noise are studied
- Results for CelebA and AFHQ datasets using the fixed noise method and the estimated noise method are presented in Table 5
Image generation using blur
- Forward diffusion process in noise-based diffusion models produces an isotropic Gaussian
- Sampling from the isotropic Gaussian and denoising produces a new image
- Blurred images form a simple distribution that can be modeled with a Gaussian mixture model
- Every image degenerates to a constant value for large T
- Constant value is the channel-wise mean of the RGB image
- Gaussian mixture model is used to sample the channel-wise mean
- Blurring schedule used with exponential rate of 0.01
- Adding small amount of Gaussian noise improves quality of generated images
Generation using other transformations
- Generation can be extended to other transformations, such as inpainting, super-resolution, and animorphosis.
- A Gaussian mask is used to modify the masking routine so the final degraded image is completely devoid of information.
- For super-resolution, the routine down-samples to a resolution of 2 × 2, or 4 values in each channel.
- Animorphosis is a transformation where a human face is iteratively transformed to an animal face.
Conclusion
- Existing diffusion models rely on Gaussian noise
- This work replaces Gaussian noise with arbitrary transforms
- Restores images afflicted by deterministic degradations such as blur, inpainting and downsampling
- Framework paves the way for a more diverse landscape of diffusion models
- Adam optimizer used with learning rate 2x10^-5
- MNIST blurred recursively 40 times with 11x11 Gaussian kernel and standard deviation 7
- CIFAR-10 blurred recursively with 11x11 Gaussian kernel and standard deviation 0.01*t+0.35
- CelebA blurred with 15x15 Gaussian kernel and standard deviation 0.01*t
- Inpainting models trained on different datasets with 60,000 gradient steps
- Super-resolution model trained on different datasets for 700,000 iterations
- Colorization task involves iteratively desaturating until fully gray-scale image
- Snowification task involves linearly interpolating c0 and c1 between c start 0 and c end 0 and c start 1 and c end 1
- Algorithm 2 produces higher quality images than other methods
- Figure 21 shows 400 random images to demonstrate qualitative results