Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Problem of computing differentially private approximate histograms and heavy hitters in a stream of elements
- Misra and Gries [Science of Computer Programming, 1982] used in non-private setting
- Chan, Li, Shi, and Xu [PETS 2012] describe a differentially private version of the Misra-Gries sketch
- Amount of noise added scales linearly with size of sketch
- We present a better mechanism for releasing Misra-Gries sketch under $(\varepsilon,\delta)$-differential privacy
- Noise magnitude independent of sketch size
- Maximum error same as best known in private non-streaming setting
- Simple and likely to be practical
- Post-processing step of Misra-Gries sketch does not increase worst-case error guarantee
- Noise magnitude less than twice the magnitude of the non-streaming setting
Paper Content
Introduction
- Computing the histogram of a dataset is a fundamental task in data analysis
- Differentially private algorithms exist to compute the histogram
- These algorithms are not practical when the amount of data is large
- Non-private approximate histograms are often computed using the Misra-Gries (MG) sketch
- The MG sketch returns approximate frequencies with an optimal error
- This paper develops a way of releasing a MG sketch in a differentially private way while adding only a small amount of noise
- This allows for efficient and accurate approximate histograms while not violating users’ privacy
- This improves upon the work of Chan et al. [11] who need a greater amount of noise
- The issue with making approximation algorithms differentially private is that the algorithm itself may have a large global sensitivity
- This paper exploits the structure of the difference between the MG sketches for neighboring inputs to prove that a simple mechanism ensures (, )-differential privacy
- This algorithm satisfies the guarantees of maximum error of /( + 1) + (log(1/)/) with high probability
- This is asymptotically optimal for approximate and pure differential privacy
- The techniques used in this paper could also be used to get approximate differential privacy, but with weaker guarantees
- This paper can replace the algorithm of Chan et al. [11] as a subroutine, leading to better results
- Another approach that can be used is to use a randomized frequency oracle, but it seems hard to do this with the optimal error size
Technical overview
- Misra-Gries sketch stores up to elements
- Each stored item has an associated counter
- When processing an element, one of three updates is done
- Contributions of paper are to release MG sketch in a differentially private way
- Two neighboring data streams have same state of MG sketch
- Adding noise of magnitude / to each count achieves pure DP
- Adding noise of magnitude (1/) to each count achieves (, )-DP
- Noise added twice, independently and to all counters
- ℓ 1 -sensitivity of representation is < 2
- Adding noise from Laplace(2/) ⊗+1 results in -differential privacy
Preliminaries
- U is a totally ordered set of size N
- Given a stream of elements from U, the goal is to estimate the frequency of each element
- Differential privacy is a definition for describing the privacy loss of a randomized mechanism
- Laplace distribution is used in many differential private algorithms
Related work
- Chan et al. show that the global ℓ1-sensitivity of a Misra-Gries sketch is Δ1.
- Privacy is achieved by adding Laplace noise with scale / to all elements in the universe.
- Böhler and Kerschbaum use secure multi-party computation to add noise to the counters of a Misra-Gries sketch.
- Balcer and Vadhan provide a lower bound for expected error.
- Heavy hitters problem has been studied in local differential privacy.
Differentially private misra-gries
- Algorithm 1 presents a variant of the non-private Misra-Gries sketch
- The algorithm processes elements of the stream one at a time
- Three updates can be performed: incrementing a counter, decrementing all counters, or replacing an element with a count of zero
- The algorithm guarantees that no elements not in the stream are output
- Fact 4 states that for any mechanism that returns a set of at most elements, the frequency estimates given by an MG sketch of size for being the input size are in the range [ () − /( + 1), ()]
- Lemma 5 states that for neighboring streams, the sets of stored elements differ in at most two keys
- The ℓ 1 -sensitivity for Misra-Gries sketches is
- There are nine combinations of processing an element that can lead to different states
- The algorithm can be used with standard implementations of MG
Privatizing standard versions of misra-gries
- Our mechanism relies on a variant of the Misra-Gries algorithm
- Sketches for neighboring datasets can differ for up to k keys
- Algorithm 2 can be changed to handle elements with a count of zero by increasing the threshold
Tips for practitioners
- Misra-Gries algorithm produces an associative array
- Noise needs to be added to the array to ensure differential privacy
- Order of keys in associative array can affect data structure
- Laplace distribution used to sample noise
- Precision-based attacks still exist
- Geometric mechanism or alternatives can be used instead of Laplace
- Threshold in Algorithm 2 might need to be changed
- Analysis for Lemma 8 is not tight
Pure differential privacy
- Discussing how to achieve differential privacy
- Adding noise to all elements of U scaled to the ℓ 1 -sensitivity
- Sensitivity of Misra-Gries sketches scales with the number of counters
- Post-processing step reduces sensitivity to 2
- Worst-case error guarantee is still /( + 1)