Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Langevin diffusions are rapidly convergent under certain conditions.
  • Vemapala and Wibisono (2019) and Chewi et al. (2022) established results for log-Sobolev and Poincaré inequalities respectively.
  • This paper goes beyond Poincaré inequalities and establishes upper and lower bounds for Langevin diffusions and LMC under weak Poincaré inequalities.
  • Results explicitly quantify the effect of the initializer on the performance of the LMC algorithm.
  • Three-step phase transition is unavoidable, as demonstrated by lower bounds.

Paper Content

Introduction

  • Problem of sampling from a target probability density using Langevin Monte Carlo (LMC) algorithm
  • LMC iterations based on discretizing a stochastic differential equation (SDE)
  • Non-asymptotic convergence of LMC studied extensively for log-concave and smooth targets
  • Functional inequality based conditions allow for multi-modality in target density
  • Logarithmic Sobolev inequality (LSI) and Lata la-Oleszkiewicz inequality (LOI) can cover range of tail behavior
  • Research program initiated by [VW19] to provide convergence guarantees for LMC when target density satisfies functional inequality and smoothness condition
  • [CEL + 22] extended framework and proved LSI can be replaced with LOI
  • [VW19] and [CEL + 22] provide state of the art guarantees under minimal set of conditions for LMC
  • Aim to complete program initiated by [VW19] and push convergence analysis of LMC to its limits
  • Study behavior of LMC for potentials that satisfy a family of weak-Poincaré inequalities (WPI)
  • WPI virtually any target density satisfies such an inequality
  • Establish non-asymptotic convergence guarantees for LMC and Langevin diffusion in Rényi divergence
  • Prove WPIs with explicit dimension dependence for two model examples of heavy-tailed distributions
  • Establish lower bounds for complexity of LMC and Langevin diffusion in Rényi divergence
  • Lower bounds indicate slow start behavior with worse dependence on initial divergence
  • Exponential dependence on initial error for LMC and diffusion unavoidable for Cauchy-type targets

Weak poincaré inequalities and rényi convergence of the diffusion

  • We consider a class of functional inequalities introduced by [RW01]
  • [RW01] showed that a certain measure satisfies a WPI with certain parameters
  • The WPI reduces to a classical Poincaré inequality in certain cases
  • The tail properties of the distribution are captured by a function that determines the convergence rate of LMC
  • We present our convergence guarantees under the generic condition (WPI)
  • We use Rényi divergence as a measure of distance between two probability distributions
  • Rényi divergence is related to KL divergence, L ∞ -norm, and χ 2 divergence
  • A bound in Rényi divergence can be translated to a bound in W 2 under certain conditions

Rényi convergence of the langevin diffusion

  • Convergence of Langevin diffusion is known under variance metric or χ2 divergence
  • Theorem 2 characterizes convergence in Rényi divergence
  • Initial error must satisfy Rq’ (ρ 0 π) < ∞ for some q’ > q
  • Classical convergence results require R∞ (ρ 0 π) < ∞
  • Proposition 3 converts WPI with Φ(•) = Osc(f ) 2 to Φ’ such that β’ (r) ≤ β WPI (r)
  • π cannot satisfy a WPI with Φ = Φ’u for u = 2

Langevin monte carlo for heavy-tailed targets

  • LMC is a computer science algorithm
  • The target must satisfy (WPI) and ∇V must be s-Hölder continuous
  • Theorem 4 provides a convergence guarantee for a generic (WPI)
  • The rate of convergence is dependent on ε, m, L, T, R 2 (µ 0 π)
  • An initialization µ 0 can be found such that R q (µ 0 π), R q (µ 0 π) ≤ Õ(d)
  • The rate of convergence is different for targets that satisfy (PI)

Examples

  • Sampling from Cauchy-type measures can be done with convergence guarantees for LMC in Rényi divergence of any finite order.
  • A potential is analyzed as a substitute for xα since the latter does not have continuous gradients.
  • Proposition 5 presents the β WPI estimate for this potential.
  • Corollary 6 establishes convergence guarantees for LMC and the Langevin diffusion.
  • Corollary 8 presents a rate for LMC and the Langevin diffusion for Cauchy-type measures.
  • Initializing with an isotropic Gaussian with an appropriately scaled variance can reduce the complexity of sampling from Cauchy-type measures.

Lower bounds for lmc via variance decay

  • LMC has worse dependence on initial divergence when target has heavier tails
  • Sharp transition at Cauchy-type logarithmic tails, in which case initial error becomes exponential
  • Method for developing lower bounds for LMC convergence rate
  • Notation of complexity introduced
  • Strategy for obtaining lower bounds when initial error is large
  • Variational representations of divergences used to obtain lower bounds

Heavy-tailed potentials and slow starts

  • Assumption of α ∈ [0, 2] for sufficiently large x
  • Dependence on initial error deteriorates as α → 0
  • Three-step phase transition:
  • Lower bounds reproduce dependence of ∆ 0 known in Gaussian setting
  • Lower bounds for smooth ∇V with ∇V (0) = 0 and ∇V (x) x α−1 for large x
  • LMC exhibits slow start behavior in heavy-tailed settings
  • Step size needs to be small enough for discretization to not harm convergence
  • Corollary 12: exponential dependence on initial error unavoidable unless good initialization available

Conclusion

  • We provided convergence guarantees for LMC and Langevin diffusion for target distributions.
  • We obtained guarantees demonstrating that targets with heavier tails lead to a worse dependence on the initial error.
  • The dependence on initial error is a polynomial of order (2−α)2 2α for α > 0, with a phase transition at α = 0.
  • We established lower bounds under generic tail growth conditions that asserted such dependence on the initial error is unavoidable.
  • We left the stability of fixed step size LMC in the number of iterations under heavy-tailed targets as an open direction for future research.
  • We need to show Osc ρt π q/2 2 ≤ ρ0 π q L ∞ (π).
  • We provided WPI estimates for our model examples.
  • We used Lemma 15 to establish WPIs with suitable dimension dependencies.
  • We used Lemma 17 to bound the weights in Lemma 17.
  • We used Lemma 18 to obtain a WPI for π α.
  • We used Lemma 19 to lower bound the Rényi divergence between ρ and π.
  • We used Lemma 20 to lower bound the decay rate of the second moment for the Langevin diffusion and LMC.
  • We used Lemma 21 to control the Rényi or KL divergence using the variance of an isotropic Gaussian initialization.