Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • DDPMs can generate high-quality samples such as images and audio.
  • DDPMs require hundreds to thousands of iterations to produce final samples.
  • Prior works have attempted to accelerate DDPMs, but have not been able to maintain sample quality.
  • We propose pseudo numerical methods for diffusion models (PNDMs).
  • PNDMs can generate higher quality synthetic images with only 50 steps compared with 1000-step DDIMs.

Paper Content

Introduction

  • Denoising Diffusion Probabilistic Models (DDPMs) model data distribution through an iterative denoising process
  • DDPMs have been applied to a variety of applications, including image generation, text generation, 3D point cloud generation, text-to-speech and image super-resolution
  • Unlike GANs, DDPMs can use similar model structures and be trained by a simple denoising objective
  • Generating samples requires hundreds to thousands of iterations and one pass through a network at every step
  • Recent works focus on improving the speed of the denoising process
  • We introduce new numerical methods called pseudo numerical methods for diffusion models (PNDMs) to generate samples along a specific manifold
  • PNDMs are second-order convergent while DDIMs are first-order convergent, making PNDMs 20x faster without loss of quality

Background

  • Classical understanding of DDPMs
  • Understanding based on Song et al. (2020b)
  • Introduction of numerical methods used in paper

Denoising diffusion probabilistic models

  • DDPMs model data distribution from Gaussian to image distribution through an iterative denoising process.
  • Two neural networks, µ θ and β θ , are used to control the speed of adding noise to the data.
  • An objective function is designed to help neural networks represent µ θ .

Stochastic differential equation

  • DDPMs can be treated as solving a stochastic differential equation
  • Anderson (1982) showed that denoising process satisfies a similar stochastic differential equation
  • Forward Euler, Runge-Kutta and Linear Multi-Step methods are numerical methods used to solve differential equations
  • PNDMs combine a nonlinear transfer part and the gradient part of the Linear Multi-Step method

Formula transformation

Classical numerical method

  • Classical numerical methods can introduce noise at high speedup rates.
  • Diffusion models are well-defined in a limited area, but numerical methods generate results along a straight line.
  • Equation (10) is unbounded at most cases, which does not satisfy the condition of numerical methods.

Pseudo numerical method on manifold

  • Problem 1: We should try to solve our problems on certain manifolds
  • Problem 2: We don’t know the target x 0 in the reverse process and random items are hard to handle
  • Solution: Divide classical numerical methods into gradient and transfer parts
  • Solution: Use nonlinear transfer part as pseudo numerical methods
  • Solution: Combination of gradient and transfer parts solves both problems

Gradient part

  • We can use the same gradient part from different classical numerical methods freely.
  • Experiments and theoretical analyses show that the gradient part from different classical methods can work well with our new transfer part.
  • We have provided three kinds of pseudo numerical methods.
  • We use the gradient part of the linear multi-step method and our new transfer part as our main pseudo numerical methods for diffusion models.

Algorithm

  • Algorithm 1 is used in the original method of denoising DDIMs
  • Algorithm 2 is used in the new method of denoising DDIMs, which uses the pseudo linear multi-step and pseudo Runge-Kutta method
  • The linear multi-step method cannot start automatically, so the Runge-Kutta method is used to compute the first three steps
  • S-PNDMs uses information from two steps at every step, while F-PNDMs uses data from four steps

Convergence order

  • Change in transfer part of numerical methods can cause unknown error.
  • Local and global error between theoretical result and new methods is computed.

Experiment

Setup

  • Conducted unconditional image generation experiments on four datasets
  • Used pre-trained models from prior works
  • Number of total steps N is 1000
  • Variance schedule is linear variance schedule
  • Used pre-trained model for Cifar10 with cosine variance schedule

Sample efficiency and quality

  • Tested Fenchel Inception Distance (FID) method and linear multi-step method on Cifar10 and CelebA
  • Compared results of previous works DDIMs
  • Used same pre-trained models to test numerical methods
  • Used models from iDDPMs to test nonlinear variance schedules
  • Used fourth-order numerical methods on Equation (10) and the model from DDIM
  • Found FON limited when number of steps is small
  • New methods S-PNDM and F-PNDM improved results regardless of number of steps
  • F-PNDM achieved lower FID than 1000 steps DDIM using only 50 steps
  • F-PNDM 20x faster without losing quality
  • F-PNDM improved best FID around 0.4 and achieved new SOTA FID score of 2.71 on CelebA
  • FID results of F-PNDM converge after more than 250 steps
  • Cosine variance schedule lowered FID using large number of steps
  • Visualization results in Appendix A.11 and toy example in Appendix A.8

Discussion

  • PNDMs is a new numerical method suitable for solving ODEs of DDPMs
  • PNDMs can generate high-quality images using fewer steps without loss of quality
  • Can be used on linear and cosine variance schedules
  • Can be used on various types of data
  • Can be used to generate conditional samples
  • Has second-order convergence
  • Can reduce best FID of pre-trained models with shorter sampling time
  • Can reduce best FID by 0.4 points on Cifar10 and CelebA
  • Achieved SOTA FID score of 2.71 on CelebA