Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Algorithm for differentially private mean estimation involves clipping samples and adding noise to their empirical mean.
- Clipping controls sensitivity and variance of noise added for privacy, but introduces statistical bias.
- Tradeoff between low bias, low variance, and low privacy loss is inherent.
- Unbiased mean estimation is possible under approximate differential privacy if distribution is symmetric.
- Unbiased mean estimation is impossible under pure or concentrated differential privacy if data is sampled from a Gaussian.
Paper Content
Introduction
- Goal of statistical inference and machine learning is to learn about a population, not sample
- Differential privacy is the standard framework for addressing privacy concerns
- Differential privacy guarantees that no attacker can infer much more about any one individual in the sample
- Adding the constraint of differential privacy to a statistical inference or machine learning task can incur an inherent cost
- There is a tradeoff between privacy and error, measured by a single loss function
- This paper studies the statistical bias of differentially private estimators, which adds an extra dimension to the tradeoff
- Estimators with little or no bias are desirable
- Existing private estimators with optimal error are significantly biased
- The paper studies mean estimation for a univariate distribution
- Without a privacy constraint, the empirical mean provides optimal error bounds
- Research on private mean estimation has pinned down the optimal mean squared error
- Bias is caused by asymmetry in the distribution
- The paper constructs unbiased private estimators for symmetric distributions
Our results
- Theorem 1.1 provides a lower bound on the MSE of any differentially private estimator for the mean of an arbitrary distribution with bounded variance
- Theorem 1.2 shows that the lower bound in Theorem 1.1 is nearly matched in most parameter regimes
- Theorem 1.3 provides an ( , )-DP algorithm for an unbiased private mean estimator for any symmetric distribution
- Theorem 1.4 shows that an unbiased mean estimator cannot satisfy ( , 0)-DP for any < ∞
Our techniques
- Provide two different methods for proving lower bounds on the MSE of low-bias private estimators
- Negative Results via the Fingerprinting Method (Theorem 1.1): Refinement of the method that separately accounts for the bias and mean squared error of the estimator
- Negative Results via Amplification (Theorem 1.1 Revisited): Proof by contradiction, assume the existence of a private estimator and show that running it on independent datasets and averaging the results would violate previous lower bounds on the mean squared error
- General-Purpose Low-Bias Mean Estimation (Theorem 1.2): Combines noisy-clipped-mean and name-and-shame algorithm
- Unbiased Mean Estimation for Symmetric Distributions (Theorem 1.3): Modify Karwa and Vadhan’s estimator to ensure unbiasedness
- Negative Result for Pure DP Unbiased Mean Estimation (Theorem 1.4): Contradiction, ( ) is uniformly bounded on the entire real line
Related work
- Unbiased estimators are a topic of interest in statistics
- Examples of topics include the MVUE and BLUE
- Results prove certain estimators are optimal
- Examples include the Gauss-Markov theorem, Lehman-Scheffé theorem, and Cramèr-Rao bound
- Little work has considered the bias of private estimators
- Bias-variance tradeoffs of a similar procedure have been examined
- Question of whether unbiased algorithms for mean estimation exist
- Empirically measure the bias induced by various mean estimation algorithms
- Methods for unbiased private estimation exist
- Private statistical estimation has been a topic of much recent interest
- Bias introduced due to clipping is more significant
- Connections between private and robust estimation and between privacy and generalization have been explored
Differential privacy
- Dataset is a collection of elements from a data universe
- Two datasets are neighboring if they differ in at most one entry
- Pure DP is when = 0, approximate DP when > 0
- Pre-processing that respects the neighboring relationship preserves DP
- Group privacy quantifies the privacy guaranteed by a DP algorithm
- ℓ 1 -sensitivity is the sensitivity of a function to changing a single point
- Laplace mechanism satisfies -DP
- DP histograms have maximum error on every bucket depending on privacy parameters
Non-private error of mean estimation
- Mean squared error is asymptotically optimal for univariate Gaussian case
- Empirical mean is asymptotically optimal for Bernoulli data
- Best estimator of mean of Bernoulli distribution is the mean of the conditional distribution
Bias-variance-privacy trilemma for general-purpose estimators
- Algorithm must have high error if it is differentially private and has low bias.
- Two different proofs are provided.
Negative result via fingerprinting
- General result states that sinh( ) captures behaviour in both small and large values of
- Bias and mean absolute error are controlled by parameters and
- Third property and parameter is implied by a bound on the MSE of the estimator
- Mean squared error is the variance of the estimator plus the square of the bias
- Lower bound on error is maximized by setting
- Proof follows the fingerprinting approach
- Proof uses Fingerprinting Derivative Lemma
- Upper bound on quantity is proven using differential privacy
- Parameters are set to maximize lower bound
Negative result via amplification
- Known MSE lower bounds can be used to derive lower bounds on MSE for private estimators with low bias.
- Theorem 3.6 provides a lower bound on the MSE of a private estimator.
- Theorem 3.7 extends the setting of local differential privacy to a setting where a dataset is randomly partitioned into blocks of fixed size.
- Theorem 3.8 shows a bias-variance-privacy tradeoff via shuffling.
- Theorem 3.9 provides a local privacy amplification by shuffling.
- Theorem 3.7 is a direct reduction to Theorem 3.9.
- Theorem 3.7 and 3.9 provide a guarantee about the mean of a private estimator.
- Theorem 3.7 and 3.9 provide a guarantee about the privacy of a private estimator.
- Theorem 3.7 and 3.9 provide a guarantee about the MSE of a private estimator.
- Theorem 3.7 and 3.9 provide a guarantee about the bias of a private estimator.
Low-bias estimators for general distributions
- Describe and analyze algorithms for private estimation with low or no bias
- Provide technical lemmata in Section 4.1
- Three algorithms in Section 4.2: ( , 0)-DP, (0, )-DP, and ( , )-DP
- Best of the three resulting bounds gives Theorem 1.2
Lemmata
- Technical lemmata is required
- Mean squared error of the clipped mean is decomposed into the sum of the sampling error and the (squared) population bias introduced (which is further bounded)
Algorithms
- Positive result based on clipping and adding noise
- Satisfies pure DP
- Analyzing procedure with bounded moments
- Goal is to quantify bias
- Algorithm parameters set to minimize overall error
- Name-and-shame procedure used to achieve unbiased estimate
- Algorithm combines clip-and-noise and name-and-shame
- Satisfies ( , )-DP
- Bias and accuracy properties analyzed
Unbiased estimators for symmetric distributions
- Distribution on R is symmetric if there exists a center of the distribution
- Algorithm is based on Karwa and Vadhan [KV18] with modifications to ensure unbiasedness
- Coarse estimate of mean obtained via DP histogram
- Bucket intervals need to have random offset to ensure unbiasedness and symmetry
Coarse unbiased estimation
- Coarse estimator is similar to Karwa and Vadhan [KV18]
- Uses stability-based histograms to ensure privacy
- Adds random offset to histogram bins to ensure unbiasedness
- Outputs more information than just the argmax
- Privacy follows from privacy of stable histogram algorithm
- Satisfies ( , )-DP
- Estimate is symmetric and unbiased
- Probability of outputting ⊥ is low for appropriately concentrated distributions
- MSE is bounded
- Equivalence under translation
- Distributional equivalence holds jointly
Final algorithm
- Algorithm 2 provides ( , )-DP and the following bias and accuracy properties
- Algorithm 2 can be applied to Gaussians
- Corollary 5.8 provides an unbiased Gaussian mean estimation
- Lemma 5.9 characterizes the symmetry of a clipped random variable from a symmetric distribution
- Theorem 6.2 shows that unbiased estimation is impossible under pure DP for exponential families
- The interval on which the parameter is well-defined must have infinite length
Pure dp estimators are uniformly bounded
- Proposition 6.7 states that a pure DP estimator is uniformly bounded globally.
- Theorem 6.2 states that an -DP algorithm cannot exist for exponential families.
- Morera’s Theorem states that a continuous function is analytic if its closed contour integrals vanish in simply connected regions.
- The Identity Theorem states that two analytic functions that agree locally must agree globally.
B background on measure theory
- A measure space is a set X with a collection of subsets of X and a function
- X is -finite when it can be decomposed into subsets of finite measure
- A function : X → C is said to be measurable if it is a measurable subset of X
- The dominated convergence theorem states that pointwise convergence of a sequence of functions may be interchanged with integration, provided that the sequence is uniformly bounded by an integrable function
- Fubini’s theorem states that switching the order of integration is permitted under measure-theoretic conditions
C impossibility result for concentrated dp
- Concentrated DP is a variant of DP with nice composition properties
- It is intermediate between pure DP and approximate DP
- It captures most common DP algorithms
- It has strong group privacy properties
- There is a bias-variance-privacy tradeoff
- The proof carries through if I[| | > ] is replaced with (| |)
- With probability 1, E[ (| |)] =
- E = ( )
- | − ( )| ≤ 1/2 for some ∈
- E[ | ≠ ⊥] = ( )
- E = ( )
- | ( , )| ≤ ℎ( ) + ( )
- E[ clip [ − , + ] ( ) | ≠ ⊥] = ( )
- D +1 ( ( ) ( ′ )) ≤ ( + 1) 2 for any pairs of datasets , ′ ∈ X
- |E | ≤ E[ 2 ] (exp(D 2 ( )) − 1)