Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Estimate discrete distribution in KL divergence
- Concentration bounds for Laplace estimator
- Deviation from mean scales as $\sqrt{k}/n$ when $n \ge k$
- Establish matching lower bound, tight up to polylogarithmic factors
Paper Content
Introduction
- Discrete distribution estimation is a fundamental problem in Statistics
- Estimating an arbitrary discrete distribution in Kullback-Leibler (KL) divergence with vanishing probability of error is the goal
- KL divergence is non-negative, unbounded and asymmetric
- Maximum likelihood estimator (empirical estimator) is commonly used
- Minimax rates and concentration bounds are typically studied
- Laplace estimator is commonly used and has a rate of convergence of max
- Concentration bounds of the form “with probability 1 − δ” are desired
- McDiarmid’s inequality can be used for 1 distance
- Best known bound for KL divergence is given by [4]
- Main result is a bound on the minimax rate of the Laplace estimator
- Lower bound on the variance of KL(p p1 ) is established
- Direct consequence of main result is improved sample complexity for estimating a tree-structured Bayesian network
Analysis sketch
- McDiarmid’s inequality is a standard way to provide concentration bounds for discrete distribution estimators
- Lemma 1 (McDiarmid’s inequality) states that if changing a variable changes the absolute value of the function by at most ci, then with probability at least, the bound simplifies to
- The goal is to obtain a good enough bound on c∞(f) to show that nc∞(f)2 decreases to 0
- [6] observes that the 1 distance between the true distribution and the empirical distribution satisfies c∞ ≤ 2/n
- A direct application of McDiarmid’s inequality for KL(p p1) results in a vacuous bound
- To provide a stronger bound, the KL divergence is written as a function of k counts N1, N2, …, Nk
- The counts Nis are not independent of each other, so the Poisson sampling process is used
- A concentration bound for KL under Poisson sampling yields a concentration bound for KL under multinomial sampling
- A high probability bound on c∞(KL) is provided
- Lemma 2 states that the expectations under multinomial and Poisson sampling are similar
- A careful coupling between binomial and Poisson random variables with the same mean is used to obtain bounds on the quantity
Analysis
- Combining equations yields result needed for KL divergence result
- Lemma 4 states that p i = where γ = 311 n + 160k n 3/2
- Both i p i = 1 and i p1 i = 1
- KL(p p ) is bounded with high probability
- Jensen’s inequality used to bound right-hand side of equation
- Lemma 3 and union bound used to bound each term
- McDiarmid’s inequality applied to function with parameter δ/2
- Theorem 3 provides lower bounds on variance of KL divergence of Laplace estimator
- Corollary 1 shows that √ k dependence is tight
- Argument in Theorem 3 can be extended to n k regime
Conclusion
- KL divergence between underlying distribution and Laplace estimator can be bounded
- Previous bound of Õ(k log(1/δ)/n) improved to Õ( √ k log 5/2 (1/δ)/n)
- Lower bound of Ω( √ k/n) on variance and tail bound of KL loss of Laplace estimator established
- Heuristic computation of leading constant done by asymptotic expansion of KL divergence
- As n → ∞, p1 i → p i and Laplace and empirical estimator are similar
- Chi-squared distribution used to approximate k i=1 (p emp i − 1/k) 2
- Poisson and Binomial random variables used to define convenient coupling
- Standard fact used to upper bound equation
- Figure 1 shows sample standard deviation vs. estimated standard deviation of Laplace estimator