Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Langevin diffusions are rapidly convergent under certain conditions.
- Vemapala and Wibisono (2019) and Chewi et al. (2022) established results for log-Sobolev and Poincaré inequalities respectively.
- This paper goes beyond Poincaré inequalities and establishes upper and lower bounds for Langevin diffusions and LMC under weak Poincaré inequalities.
- Results explicitly quantify the effect of the initializer on the performance of the LMC algorithm.
- Three-step phase transition is unavoidable, as demonstrated by lower bounds.
Paper Content
Introduction
- Problem of sampling from a target probability density using Langevin Monte Carlo (LMC) algorithm
- LMC iterations based on discretizing a stochastic differential equation (SDE)
- Non-asymptotic convergence of LMC studied extensively for log-concave and smooth targets
- Functional inequality based conditions allow for multi-modality in target density
- Logarithmic Sobolev inequality (LSI) and Lata la-Oleszkiewicz inequality (LOI) can cover range of tail behavior
- Research program initiated by [VW19] to provide convergence guarantees for LMC when target density satisfies functional inequality and smoothness condition
- [CEL + 22] extended framework and proved LSI can be replaced with LOI
- [VW19] and [CEL + 22] provide state of the art guarantees under minimal set of conditions for LMC
- Aim to complete program initiated by [VW19] and push convergence analysis of LMC to its limits
- Study behavior of LMC for potentials that satisfy a family of weak-Poincaré inequalities (WPI)
- WPI virtually any target density satisfies such an inequality
- Establish non-asymptotic convergence guarantees for LMC and Langevin diffusion in Rényi divergence
- Prove WPIs with explicit dimension dependence for two model examples of heavy-tailed distributions
- Establish lower bounds for complexity of LMC and Langevin diffusion in Rényi divergence
- Lower bounds indicate slow start behavior with worse dependence on initial divergence
- Exponential dependence on initial error for LMC and diffusion unavoidable for Cauchy-type targets
Weak poincaré inequalities and rényi convergence of the diffusion
- We consider a class of functional inequalities introduced by [RW01]
- [RW01] showed that a certain measure satisfies a WPI with certain parameters
- The WPI reduces to a classical Poincaré inequality in certain cases
- The tail properties of the distribution are captured by a function that determines the convergence rate of LMC
- We present our convergence guarantees under the generic condition (WPI)
- We use Rényi divergence as a measure of distance between two probability distributions
- Rényi divergence is related to KL divergence, L ∞ -norm, and χ 2 divergence
- A bound in Rényi divergence can be translated to a bound in W 2 under certain conditions
Rényi convergence of the langevin diffusion
- Convergence of Langevin diffusion is known under variance metric or χ2 divergence
- Theorem 2 characterizes convergence in Rényi divergence
- Initial error must satisfy Rq’ (ρ 0 π) < ∞ for some q’ > q
- Classical convergence results require R∞ (ρ 0 π) < ∞
- Proposition 3 converts WPI with Φ(•) = Osc(f ) 2 to Φ’ such that β’ (r) ≤ β WPI (r)
- π cannot satisfy a WPI with Φ = Φ’u for u = 2
Langevin monte carlo for heavy-tailed targets
- LMC is a computer science algorithm
- The target must satisfy (WPI) and ∇V must be s-Hölder continuous
- Theorem 4 provides a convergence guarantee for a generic (WPI)
- The rate of convergence is dependent on ε, m, L, T, R 2 (µ 0 π)
- An initialization µ 0 can be found such that R q (µ 0 π), R q (µ 0 π) ≤ Õ(d)
- The rate of convergence is different for targets that satisfy (PI)
Examples
- Sampling from Cauchy-type measures can be done with convergence guarantees for LMC in Rényi divergence of any finite order.
- A potential is analyzed as a substitute for xα since the latter does not have continuous gradients.
- Proposition 5 presents the β WPI estimate for this potential.
- Corollary 6 establishes convergence guarantees for LMC and the Langevin diffusion.
- Corollary 8 presents a rate for LMC and the Langevin diffusion for Cauchy-type measures.
- Initializing with an isotropic Gaussian with an appropriately scaled variance can reduce the complexity of sampling from Cauchy-type measures.
Lower bounds for lmc via variance decay
- LMC has worse dependence on initial divergence when target has heavier tails
- Sharp transition at Cauchy-type logarithmic tails, in which case initial error becomes exponential
- Method for developing lower bounds for LMC convergence rate
- Notation of complexity introduced
- Strategy for obtaining lower bounds when initial error is large
- Variational representations of divergences used to obtain lower bounds
Heavy-tailed potentials and slow starts
- Assumption of α ∈ [0, 2] for sufficiently large x
- Dependence on initial error deteriorates as α → 0
- Three-step phase transition:
- Lower bounds reproduce dependence of ∆ 0 known in Gaussian setting
- Lower bounds for smooth ∇V with ∇V (0) = 0 and ∇V (x) x α−1 for large x
- LMC exhibits slow start behavior in heavy-tailed settings
- Step size needs to be small enough for discretization to not harm convergence
- Corollary 12: exponential dependence on initial error unavoidable unless good initialization available
Conclusion
- We provided convergence guarantees for LMC and Langevin diffusion for target distributions.
- We obtained guarantees demonstrating that targets with heavier tails lead to a worse dependence on the initial error.
- The dependence on initial error is a polynomial of order (2−α)2 2α for α > 0, with a phase transition at α = 0.
- We established lower bounds under generic tail growth conditions that asserted such dependence on the initial error is unavoidable.
- We left the stability of fixed step size LMC in the number of iterations under heavy-tailed targets as an open direction for future research.
- We need to show Osc ρt π q/2 2 ≤ ρ0 π q L ∞ (π).
- We provided WPI estimates for our model examples.
- We used Lemma 15 to establish WPIs with suitable dimension dependencies.
- We used Lemma 17 to bound the weights in Lemma 17.
- We used Lemma 18 to obtain a WPI for π α.
- We used Lemma 19 to lower bound the Rényi divergence between ρ and π.
- We used Lemma 20 to lower bound the decay rate of the second moment for the Langevin diffusion and LMC.
- We used Lemma 21 to control the Rényi or KL divergence using the variance of an isotropic Gaussian initialization.