Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Proposed a novel Bayesian inference framework for distributed differentially private linear regression
  • Data is split between multiple parties who share summary statistics in a privacy-preserving way
  • Developed a novel generative statistical model for privately shared statistics
  • Bayesian estimation of regression coefficients is conducted using Markov chain Monte Carlo algorithms
  • Also provided a fast version to perform Bayesian estimation in one iteration
  • Proposed methods have computational advantages over competitors
  • Numerical results on real and simulated data demonstrate well-rounded estimation and prediction

Paper Content

Introduction

  • Linear regression is a mathematical method used in statistical research
  • Many researchers have been working on linear regression since the 19th century
  • Differential privacy is the most commonly used definition for privacy
  • There is a growing interest in differentially private linear regression
  • General-purpose Bayesian differentially private estimation methods can be used in regression problems
  • Hierarchical model for privatised data and Bayesian estimation for the model parameters
  • Differential privacy mechanisms for posterior sampling and linear regression
  • General-purpose differentially private Markov chain Monte Carlo (MCMC) algorithms can be applied to regression
  • Perturbing polynomial objective functions with privacy-preserving noise
  • Perturbation of summary statistics
  • Point estimation of the linear regression parameters
  • Confidence intervals for the coefficients of linear regression
  • Rates of convergence for parameter estimation with differential privacy
  • Distributed setting where the total dataset is shared among multiple parties
  • Adding noise to summary statistics of linear regression
  • Fast Bayesian estimation methods
  • MCMC algorithms for iterative sampling from posterior distributions

Differential privacy

  • Differential privacy is a type of algorithm that takes in sensitive data and returns a random output.
  • The amount of privacy is determined by a parameter, δ.
  • Noise-adding mechanisms are used to preserve privacy, with the Gaussian mechanism being a popular one.
  • This paper focuses on ( , δ)-DP and the Gaussian mechanism to generate noisy observations.
  • The paper presents a hierarchical model for differentially private distributed linear regression.

Basic model and privacy setup

  • We have a sequence of random variables (x i , y i )
  • We consider normal linear regression to model the dependency between x i and y i
  • We assume the feature vectors x i are i.i.d. with a normal distribution
  • We define summary statistics of X and y
  • We assume a setup where S and z are privately released
  • We set up a hierarchical model to enable Bayesian inference of θ
  • We use the exact conditional distribution p(z|S, θ, σ 2 )
  • Our model has a different hierarchical structure and requires less privacy-preserving noise

Distributed setting

  • Model extended to distributed setting
  • Data shared among J ≥ 1 data holders
  • Each data holder shares summary statistics with privacy-preserving noise
  • Hierarchical structure of model specified for normally distributed x i ’s
  • Node-specific observations more informative on θ than aggregate versions
  • Partitioning of data relevant to data privacy applications outside distributed learning framework

Algorithms for bayesian inference

  • Bayesian inference targets the posterior distribution of latent variables
  • Present several Bayesian inference algorithms for hierarchical model
  • Two cases considered: normal and non-normal Px
  • MCMC algorithm and closed form solution for posterior of θ developed

Normally distributed features

  • MCMC algorithm presented for Bayesian inference for differentially private distributed linear regression model
  • Latent variables involved: θ, Σ x , σ 2 y , S 1:J , z 1:J
  • Poor convergence due to high posterior correlation between θ and z 1:J
  • Reduced model with θ, Σ x , σ 2 y as latent variables
  • Closed-form full conditional distributions for θ and Σ x
  • Metropolis-Hastings moves to update S 1:J and σ 2 y
  • Wishart distribution used to update S j
  • Adaptive MCMC framework used to target acceptance rate of 0.2

Features with a general distribution

  • Normality assumption for x i ’s may not be adequate for some data sets
  • Updating S j ’s can be a bottleneck in terms of computation time and convergence
  • Algorithms provide accurate estimations even for normally distributed features
  • Estimate S j ’s from the beginning and fix them during inference procedure

Extensions

  • Variants of methodology mentioned in Appendix B
  • Average feature vectors in X and corresponding response variables in y to make them approximately normal
  • Details of this approach in Appendix B.1
  • If features are normally distributed but data not centred, need to include intercept parameter and modify hierarchical model in Appendix B.2

Numerical experiments

  • MCMC-normalX, MCMC-fixedS, and Bayes-fixedS-fast algorithms are evaluated numerically
  • Compared to adaSSP of Wang (2018) and MCMC method of Bernstein and Sheldon (2019)
  • Extensions of adaSSP and MCMC-B&S for J ≥ 1 implemented
  • Model in Bernstein and Sheldon (2019) generalised for J ≥ 1
  • Code to replicate experiments available at given URL

Experiments with simulated data

  • Considered two different configurations for problem size
  • Generated data with certain parameters
  • Used same parameters for inference
  • Evaluated methods at different combinations of J and
  • Ran MCMC algorithms for 10,000 iterations
  • Looked at mean squared errors of estimates and predictions
  • MCMC-fixedS and Bayes-fixedS-fast outperformed adaSSP and MCMC-B&S
  • MCMC-normalX better at d = 2, MCMC-B&S better at d = 5
  • All methods improved as grows
  • Compared computation times of MCMC algorithms
  • MCMC-B&S slowed down by O(d6)

Experiments with real data

  • Used four different data sets from UCI Machine Learning Repository
  • Disregarded columns with string data or key values
  • Considered most right-hand column as y
  • 80% of data used for training, 20% for testing
  • Average prediction performances presented in Table 1
  • MCMC-fixed-S and Bayes-fixed-S most stable
  • MCMC-fixed-S and Bayes-fixed-S beat adaSSP and MCMC-B&S when J > 1

Conclusion

  • Propose a novel Bayesian inference framework for a differentially private distributed linear regression setting
  • Exploit the conditional structure between the summary statistics of linear regression
  • Numerical experiments show proposed methods are competitive with state-of-the-art alternatives
  • Room for improvement of MCMC-normalX
  • Full Conditional Distribution of Σx and θ
  • Acceptance Ratio for the MH Update of Sj and σ2y
  • Extensions mentioned in Section 4.4 indicate potential future directions
  • Extension of Bernstein and Sheldon (2019) suited to observations
  • Model includes b = x, Σx, and S0
  • Extension of adaSSP (Wang, 2018) for J ≥ 1
  • Calculate D × 1 mean vector and D × D covariance matrix