Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- DDPMs can generate high-quality samples such as images and audio.
- DDPMs require hundreds to thousands of iterations to produce final samples.
- Prior works have attempted to accelerate DDPMs, but have not been able to maintain sample quality.
- We propose pseudo numerical methods for diffusion models (PNDMs).
- PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs.
Paper Content
Introduction
- Denoising Diffusion Probabilistic Models (DDPMs) model data distribution through an iterative denoising process
- DDPMs have been applied to a variety of applications, including image generation, text generation, 3D point cloud generation, text-to-speech and image super-resolution
- Unlike GANs, DDPMs can use similar model structures and be trained by a simple denoising objective
- Generating samples requires hundreds to thousands of iterations and one pass through a network at every step
- Recent works focus on improving the speed of the denoising process
- We introduce new numerical methods called pseudo numerical methods for diffusion models (PNDMs) to generate samples along a specific manifold
- PNDMs are second-order convergent while DDIMs are first-order convergent, making PNDMs 20x faster without loss of quality
Background
- Classical understanding of DDPMs
- Understanding based on Song et al. (2020b)
- Introduction of numerical methods used in paper
Denoising diffusion probabilistic models
- DDPMs model data distribution from Gaussian to image distribution through an iterative denoising process.
- Two neural networks, µ θ and β θ , are used to control the speed of adding noise to the data.
- An objective function is designed to help neural networks represent µ θ .
Stochastic differential equation
- DDPMs can be treated as solving a stochastic differential equation
- Anderson (1982) showed that denoising process satisfies a similar stochastic differential equation
- Forward Euler, Runge-Kutta and Linear Multi-Step methods are numerical methods used to solve differential equations
- PNDMs combine a nonlinear transfer part and the gradient part of the Linear Multi-Step method
Formula transformation
Classical numerical method
- Classical numerical methods can introduce noise at high speedup rates.
- Diffusion models are well-defined in a limited area, but numerical methods generate results along a straight line.
- Equation (10) is unbounded at most cases, which does not satisfy the condition of numerical methods.
Pseudo numerical method on manifold
- Problem 1: We should try to solve our problems on certain manifolds
- Problem 2: We don’t know the target x 0 in the reverse process and random items are hard to handle
- Solution: Divide classical numerical methods into gradient and transfer parts
- Solution: Use nonlinear transfer part as pseudo numerical methods
- Solution: Combination of gradient and transfer parts solves both problems
Gradient part
- We can use the same gradient part from different classical numerical methods freely.
- Experiments and theoretical analyses show that the gradient part from different classical methods can work well with our new transfer part.
- We have provided three kinds of pseudo numerical methods.
- We use the gradient part of the linear multi-step method and our new transfer part as our main pseudo numerical methods for diffusion models.
Algorithm
- Algorithm 1 is used in the original method of denoising DDIMs
- Algorithm 2 is used in the new method of denoising DDIMs, which uses the pseudo linear multi-step and pseudo Runge-Kutta method
- The linear multi-step method cannot start automatically, so the Runge-Kutta method is used to compute the first three steps
- S-PNDMs uses information from two steps at every step, while F-PNDMs uses data from four steps
Convergence order
- Change in transfer part of numerical methods can cause unknown error.
- Local and global error between theoretical result and new methods is computed.
Experiment
Setup
- Conducted unconditional image generation experiments on four datasets
- Used pre-trained models from prior works
- Number of total steps N is 1000
- Variance schedule is linear variance schedule
- Used pre-trained model for Cifar10 with cosine variance schedule
Sample efficiency and quality
- Tested Fenchel Inception Distance (FID) method and linear multi-step method on Cifar10 and CelebA
- Compared results of previous works DDIMs
- Used same pre-trained models to test numerical methods
- Used models from iDDPMs to test nonlinear variance schedules
- Used fourth-order numerical methods on Equation (10) and the model from DDIM
- Found FON limited when number of steps is small
- New methods S-PNDM and F-PNDM improved results regardless of number of steps
- F-PNDM achieved lower FID than 1000 steps DDIM using only 50 steps
- F-PNDM 20x faster without losing quality
- F-PNDM improved best FID around 0.4 and achieved new SOTA FID score of 2.71 on CelebA
- FID results of F-PNDM converge after more than 250 steps
- Cosine variance schedule lowered FID using large number of steps
- Visualization results in Appendix A.11 and toy example in Appendix A.8
Discussion
- PNDMs is a new numerical method suitable for solving ODEs of DDPMs
- PNDMs can generate high-quality images using fewer steps without loss of quality
- Can be used on linear and cosine variance schedules
- Can be used on various types of data
- Can be used to generate conditional samples
- Has second-order convergence
- Can reduce best FID of pre-trained models with shorter sampling time
- Can reduce best FID by 0.4 points on Cifar10 and CelebA
- Achieved SOTA FID score of 2.71 on CelebA