  • Algorithm for differentially private mean estimation involves clipping samples and adding noise to their empirical mean.
  • Clipping controls sensitivity and variance of noise added for privacy, but introduces statistical bias.
  • Tradeoff between low bias, low variance, and low privacy loss is inherent.
  • Unbiased mean estimation is possible under approximate differential privacy if distribution is symmetric.
  • Unbiased mean estimation is impossible under pure or concentrated differential privacy if data is sampled from a Gaussian.

Paper Content


  • Goal of statistical inference and machine learning is to learn about a population, not sample
  • Differential privacy is the standard framework for addressing privacy concerns
  • Differential privacy guarantees that no attacker can infer much more about any one individual in the sample
  • Adding the constraint of differential privacy to a statistical inference or machine learning task can incur an inherent cost
  • There is a tradeoff between privacy and error, measured by a single loss function
  • This paper studies the statistical bias of differentially private estimators, which adds an extra dimension to the tradeoff
  • Estimators with little or no bias are desirable
  • Existing private estimators with optimal error are significantly biased
  • The paper studies mean estimation for a univariate distribution
  • Without a privacy constraint, the empirical mean provides optimal error bounds
  • Research on private mean estimation has pinned down the optimal mean squared error
  • Bias is caused by asymmetry in the distribution
  • The paper constructs unbiased private estimators for symmetric distributions

Our results

  • Theorem 1.1 provides a lower bound on the MSE of any differentially private estimator for the mean of an arbitrary distribution with bounded variance
  • Theorem 1.2 shows that the lower bound in Theorem 1.1 is nearly matched in most parameter regimes
  • Theorem 1.3 provides an ( , )-DP algorithm for an unbiased private mean estimator for any symmetric distribution
  • Theorem 1.4 shows that an unbiased mean estimator cannot satisfy ( , 0)-DP for any < ∞

Our techniques

  • Provide two different methods for proving lower bounds on the MSE of low-bias private estimators
  • Negative Results via the Fingerprinting Method (Theorem 1.1): Refinement of the method that separately accounts for the bias and mean squared error of the estimator
  • Negative Results via Amplification (Theorem 1.1 Revisited): Proof by contradiction, assume the existence of a private estimator and show that running it on independent datasets and averaging the results would violate previous lower bounds on the mean squared error
  • General-Purpose Low-Bias Mean Estimation (Theorem 1.2): Combines noisy-clipped-mean and name-and-shame algorithm
  • Unbiased Mean Estimation for Symmetric Distributions (Theorem 1.3): Modify Karwa and Vadhan’s estimator to ensure unbiasedness
  • Negative Result for Pure DP Unbiased Mean Estimation (Theorem 1.4): Contradiction, ( ) is uniformly bounded on the entire real line
  • Unbiased estimators are a topic of interest in statistics
  • Examples of topics include the MVUE and BLUE
  • Results prove certain estimators are optimal
  • Examples include the Gauss-Markov theorem, Lehman-Scheffé theorem, and Cramèr-Rao bound
  • Little work has considered the bias of private estimators
  • Bias-variance tradeoffs of a similar procedure have been examined
  • Question of whether unbiased algorithms for mean estimation exist
  • Empirically measure the bias induced by various mean estimation algorithms
  • Methods for unbiased private estimation exist
  • Private statistical estimation has been a topic of much recent interest
  • Bias introduced due to clipping is more significant
  • Connections between private and robust estimation and between privacy and generalization have been explored

Differential privacy

  • Dataset is a collection of elements from a data universe
  • Two datasets are neighboring if they differ in at most one entry
  • Pure DP is when = 0, approximate DP when > 0
  • Pre-processing that respects the neighboring relationship preserves DP
  • Group privacy quantifies the privacy guaranteed by a DP algorithm
  • ℓ 1 -sensitivity is the sensitivity of a function to changing a single point
  • Laplace mechanism satisfies -DP
  • DP histograms have maximum error on every bucket depending on privacy parameters

Non-private error of mean estimation

  • Mean squared error is asymptotically optimal for univariate Gaussian case
  • Empirical mean is asymptotically optimal for Bernoulli data
  • Best estimator of mean of Bernoulli distribution is the mean of the conditional distribution

Bias-variance-privacy trilemma for general-purpose estimators

  • Algorithm must have high error if it is differentially private and has low bias.
  • Two different proofs are provided.

Negative result via fingerprinting

  • General result states that sinh( ) captures behaviour in both small and large values of
  • Bias and mean absolute error are controlled by parameters and
  • Third property and parameter is implied by a bound on the MSE of the estimator
  • Mean squared error is the variance of the estimator plus the square of the bias
  • Lower bound on error is maximized by setting
  • Proof follows the fingerprinting approach
  • Proof uses Fingerprinting Derivative Lemma
  • Upper bound on quantity is proven using differential privacy
  • Parameters are set to maximize lower bound

Negative result via amplification

  • Known MSE lower bounds can be used to derive lower bounds on MSE for private estimators with low bias.
  • Theorem 3.6 provides a lower bound on the MSE of a private estimator.
  • Theorem 3.7 extends the setting of local differential privacy to a setting where a dataset is randomly partitioned into blocks of fixed size.
  • Theorem 3.8 shows a bias-variance-privacy tradeoff via shuffling.
  • Theorem 3.9 provides a local privacy amplification by shuffling.
  • Theorem 3.7 is a direct reduction to Theorem 3.9.
  • Theorem 3.7 and 3.9 provide a guarantee about the mean of a private estimator.
  • Theorem 3.7 and 3.9 provide a guarantee about the privacy of a private estimator.
  • Theorem 3.7 and 3.9 provide a guarantee about the MSE of a private estimator.
  • Theorem 3.7 and 3.9 provide a guarantee about the bias of a private estimator.

Low-bias estimators for general distributions

  • Describe and analyze algorithms for private estimation with low or no bias
  • Provide technical lemmata in Section 4.1
  • Three algorithms in Section 4.2: ( , 0)-DP, (0, )-DP, and ( , )-DP
  • Best of the three resulting bounds gives Theorem 1.2


  • Technical lemmata is required
  • Mean squared error of the clipped mean is decomposed into the sum of the sampling error and the (squared) population bias introduced (which is further bounded)


  • Positive result based on clipping and adding noise
  • Satisfies pure DP
  • Analyzing procedure with bounded moments
  • Goal is to quantify bias
  • Algorithm parameters set to minimize overall error
  • Name-and-shame procedure used to achieve unbiased estimate
  • Algorithm combines clip-and-noise and name-and-shame
  • Satisfies ( , )-DP
  • Bias and accuracy properties analyzed

Unbiased estimators for symmetric distributions

  • Distribution on R is symmetric if there exists a center of the distribution
  • Algorithm is based on Karwa and Vadhan [KV18] with modifications to ensure unbiasedness
  • Coarse estimate of mean obtained via DP histogram
  • Bucket intervals need to have random offset to ensure unbiasedness and symmetry

Coarse unbiased estimation

  • Coarse estimator is similar to Karwa and Vadhan [KV18]
  • Uses stability-based histograms to ensure privacy
  • Adds random offset to histogram bins to ensure unbiasedness
  • Outputs more information than just the argmax
  • Privacy follows from privacy of stable histogram algorithm
  • Satisfies ( , )-DP
  • Estimate is symmetric and unbiased
  • Probability of outputting ⊥ is low for appropriately concentrated distributions
  • MSE is bounded
  • Equivalence under translation
  • Distributional equivalence holds jointly

Final algorithm

  • Algorithm 2 provides ( , )-DP and the following bias and accuracy properties
  • Algorithm 2 can be applied to Gaussians
  • Corollary 5.8 provides an unbiased Gaussian mean estimation
  • Lemma 5.9 characterizes the symmetry of a clipped random variable from a symmetric distribution
  • Theorem 6.2 shows that unbiased estimation is impossible under pure DP for exponential families
  • The interval on which the parameter is well-defined must have infinite length

Pure dp estimators are uniformly bounded

  • Proposition 6.7 states that a pure DP estimator is uniformly bounded globally.
  • Theorem 6.2 states that an -DP algorithm cannot exist for exponential families.
  • Morera’s Theorem states that a continuous function is analytic if its closed contour integrals vanish in simply connected regions.
  • The Identity Theorem states that two analytic functions that agree locally must agree globally.

B background on measure theory

  • A measure space is a set X with a collection of subsets of X and a function
  • X is -finite when it can be decomposed into subsets of finite measure
  • A function : X → C is said to be measurable if it is a measurable subset of X
  • The dominated convergence theorem states that pointwise convergence of a sequence of functions may be interchanged with integration, provided that the sequence is uniformly bounded by an integrable function
  • Fubini’s theorem states that switching the order of integration is permitted under measure-theoretic conditions

C impossibility result for concentrated dp

  • Concentrated DP is a variant of DP with nice composition properties
  • It is intermediate between pure DP and approximate DP
  • It captures most common DP algorithms
  • It has strong group privacy properties
  • There is a bias-variance-privacy tradeoff
  • The proof carries through if I[| | > ] is replaced with (| |)
  • With probability 1, E[ (| |)] =
  • E = ( )
  • | − ( )| ≤ 1/2 for some ∈
  • E[ | ≠ ⊥] = ( )
  • E = ( )
  • | ( , )| ≤ ℎ( ) + ( )
  • E[ clip [ − , + ] ( ) | ≠ ⊥] = ( )
  • D +1 ( ( ) ( ′ )) ≤ ( + 1) 2 for any pairs of datasets , ′ ∈ X
  • |E | ≤ E[ 2 ] (exp(D 2 ( )) − 1)