Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposed a novel Bayesian inference framework for distributed differentially private linear regression
- Data is split between multiple parties who share summary statistics in a privacy-preserving way
- Developed a novel generative statistical model for privately shared statistics
- Bayesian estimation of regression coefficients is conducted using Markov chain Monte Carlo algorithms
- Also provided a fast version to perform Bayesian estimation in one iteration
- Proposed methods have computational advantages over competitors
- Numerical results on real and simulated data demonstrate well-rounded estimation and prediction
Paper Content
Introduction
- Linear regression is a mathematical method used in statistical research
- Many researchers have been working on linear regression since the 19th century
- Differential privacy is the most commonly used definition for privacy
- There is a growing interest in differentially private linear regression
- General-purpose Bayesian differentially private estimation methods can be used in regression problems
- Hierarchical model for privatised data and Bayesian estimation for the model parameters
- Differential privacy mechanisms for posterior sampling and linear regression
- General-purpose differentially private Markov chain Monte Carlo (MCMC) algorithms can be applied to regression
- Perturbing polynomial objective functions with privacy-preserving noise
- Perturbation of summary statistics
- Point estimation of the linear regression parameters
- Confidence intervals for the coefficients of linear regression
- Rates of convergence for parameter estimation with differential privacy
- Distributed setting where the total dataset is shared among multiple parties
- Adding noise to summary statistics of linear regression
- Fast Bayesian estimation methods
- MCMC algorithms for iterative sampling from posterior distributions
Differential privacy
- Differential privacy is a type of algorithm that takes in sensitive data and returns a random output.
- The amount of privacy is determined by a parameter, δ.
- Noise-adding mechanisms are used to preserve privacy, with the Gaussian mechanism being a popular one.
- This paper focuses on ( , δ)-DP and the Gaussian mechanism to generate noisy observations.
- The paper presents a hierarchical model for differentially private distributed linear regression.
Basic model and privacy setup
- We have a sequence of random variables (x i , y i )
- We consider normal linear regression to model the dependency between x i and y i
- We assume the feature vectors x i are i.i.d. with a normal distribution
- We define summary statistics of X and y
- We assume a setup where S and z are privately released
- We set up a hierarchical model to enable Bayesian inference of θ
- We use the exact conditional distribution p(z|S, θ, σ 2 )
- Our model has a different hierarchical structure and requires less privacy-preserving noise
Distributed setting
- Model extended to distributed setting
- Data shared among J ≥ 1 data holders
- Each data holder shares summary statistics with privacy-preserving noise
- Hierarchical structure of model specified for normally distributed x i ’s
- Node-specific observations more informative on θ than aggregate versions
- Partitioning of data relevant to data privacy applications outside distributed learning framework
Algorithms for bayesian inference
- Bayesian inference targets the posterior distribution of latent variables
- Present several Bayesian inference algorithms for hierarchical model
- Two cases considered: normal and non-normal Px
- MCMC algorithm and closed form solution for posterior of θ developed
Normally distributed features
- MCMC algorithm presented for Bayesian inference for differentially private distributed linear regression model
- Latent variables involved: θ, Σ x , σ 2 y , S 1:J , z 1:J
- Poor convergence due to high posterior correlation between θ and z 1:J
- Reduced model with θ, Σ x , σ 2 y as latent variables
- Closed-form full conditional distributions for θ and Σ x
- Metropolis-Hastings moves to update S 1:J and σ 2 y
- Wishart distribution used to update S j
- Adaptive MCMC framework used to target acceptance rate of 0.2
Features with a general distribution
- Normality assumption for x i ’s may not be adequate for some data sets
- Updating S j ’s can be a bottleneck in terms of computation time and convergence
- Algorithms provide accurate estimations even for normally distributed features
- Estimate S j ’s from the beginning and fix them during inference procedure
Extensions
- Variants of methodology mentioned in Appendix B
- Average feature vectors in X and corresponding response variables in y to make them approximately normal
- Details of this approach in Appendix B.1
- If features are normally distributed but data not centred, need to include intercept parameter and modify hierarchical model in Appendix B.2
Numerical experiments
- MCMC-normalX, MCMC-fixedS, and Bayes-fixedS-fast algorithms are evaluated numerically
- Compared to adaSSP of Wang (2018) and MCMC method of Bernstein and Sheldon (2019)
- Extensions of adaSSP and MCMC-B&S for J ≥ 1 implemented
- Model in Bernstein and Sheldon (2019) generalised for J ≥ 1
- Code to replicate experiments available at given URL
Experiments with simulated data
- Considered two different configurations for problem size
- Generated data with certain parameters
- Used same parameters for inference
- Evaluated methods at different combinations of J and
- Ran MCMC algorithms for 10,000 iterations
- Looked at mean squared errors of estimates and predictions
- MCMC-fixedS and Bayes-fixedS-fast outperformed adaSSP and MCMC-B&S
- MCMC-normalX better at d = 2, MCMC-B&S better at d = 5
- All methods improved as grows
- Compared computation times of MCMC algorithms
- MCMC-B&S slowed down by O(d6)
Experiments with real data
- Used four different data sets from UCI Machine Learning Repository
- Disregarded columns with string data or key values
- Considered most right-hand column as y
- 80% of data used for training, 20% for testing
- Average prediction performances presented in Table 1
- MCMC-fixed-S and Bayes-fixed-S most stable
- MCMC-fixed-S and Bayes-fixed-S beat adaSSP and MCMC-B&S when J > 1
Conclusion
- Propose a novel Bayesian inference framework for a differentially private distributed linear regression setting
- Exploit the conditional structure between the summary statistics of linear regression
- Numerical experiments show proposed methods are competitive with state-of-the-art alternatives
- Room for improvement of MCMC-normalX
- Full Conditional Distribution of Σx and θ
- Acceptance Ratio for the MH Update of Sj and σ2y
- Extensions mentioned in Section 4.4 indicate potential future directions
- Extension of Bernstein and Sheldon (2019) suited to observations
- Model includes b = x, Σx, and S0
- Extension of adaSSP (Wang, 2018) for J ≥ 1
- Calculate D × 1 mean vector and D × D covariance matrix