Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Problem of computing differentially private approximate histograms and heavy hitters in a stream of elements
  • Misra and Gries [Science of Computer Programming, 1982] used in non-private setting
  • Chan, Li, Shi, and Xu [PETS 2012] describe a differentially private version of the Misra-Gries sketch
  • Amount of noise added scales linearly with size of sketch
  • We present a better mechanism for releasing Misra-Gries sketch under $(\varepsilon,\delta)$-differential privacy
  • Noise magnitude independent of sketch size
  • Maximum error same as best known in private non-streaming setting
  • Simple and likely to be practical
  • Post-processing step of Misra-Gries sketch does not increase worst-case error guarantee
  • Noise magnitude less than twice the magnitude of the non-streaming setting

Paper Content

Introduction

  • Computing the histogram of a dataset is a fundamental task in data analysis
  • Differentially private algorithms exist to compute the histogram
  • These algorithms are not practical when the amount of data is large
  • Non-private approximate histograms are often computed using the Misra-Gries (MG) sketch
  • The MG sketch returns approximate frequencies with an optimal error
  • This paper develops a way of releasing a MG sketch in a differentially private way while adding only a small amount of noise
  • This allows for efficient and accurate approximate histograms while not violating users’ privacy
  • This improves upon the work of Chan et al. [11] who need a greater amount of noise
  • The issue with making approximation algorithms differentially private is that the algorithm itself may have a large global sensitivity
  • This paper exploits the structure of the difference between the MG sketches for neighboring inputs to prove that a simple mechanism ensures (, )-differential privacy
  • This algorithm satisfies the guarantees of maximum error of /( + 1) + (log(1/)/) with high probability
  • This is asymptotically optimal for approximate and pure differential privacy
  • The techniques used in this paper could also be used to get approximate differential privacy, but with weaker guarantees
  • This paper can replace the algorithm of Chan et al. [11] as a subroutine, leading to better results
  • Another approach that can be used is to use a randomized frequency oracle, but it seems hard to do this with the optimal error size

Technical overview

  • Misra-Gries sketch stores up to elements
  • Each stored item has an associated counter
  • When processing an element, one of three updates is done
  • Contributions of paper are to release MG sketch in a differentially private way
  • Two neighboring data streams have same state of MG sketch
  • Adding noise of magnitude / to each count achieves pure DP
  • Adding noise of magnitude (1/) to each count achieves (, )-DP
  • Noise added twice, independently and to all counters
  • ℓ 1 -sensitivity of representation is < 2
  • Adding noise from Laplace(2/) ⊗+1 results in -differential privacy

Preliminaries

  • U is a totally ordered set of size N
  • Given a stream of elements from U, the goal is to estimate the frequency of each element
  • Differential privacy is a definition for describing the privacy loss of a randomized mechanism
  • Laplace distribution is used in many differential private algorithms
  • Chan et al. show that the global ℓ1-sensitivity of a Misra-Gries sketch is Δ1.
  • Privacy is achieved by adding Laplace noise with scale / to all elements in the universe.
  • Böhler and Kerschbaum use secure multi-party computation to add noise to the counters of a Misra-Gries sketch.
  • Balcer and Vadhan provide a lower bound for expected error.
  • Heavy hitters problem has been studied in local differential privacy.

Differentially private misra-gries

  • Algorithm 1 presents a variant of the non-private Misra-Gries sketch
  • The algorithm processes elements of the stream one at a time
  • Three updates can be performed: incrementing a counter, decrementing all counters, or replacing an element with a count of zero
  • The algorithm guarantees that no elements not in the stream are output
  • Fact 4 states that for any mechanism that returns a set of at most elements, the frequency estimates given by an MG sketch of size for being the input size are in the range [ () − /( + 1), ()]
  • Lemma 5 states that for neighboring streams, the sets of stored elements differ in at most two keys
  • The ℓ 1 -sensitivity for Misra-Gries sketches is
  • There are nine combinations of processing an element that can lead to different states
  • The algorithm can be used with standard implementations of MG

Privatizing standard versions of misra-gries

  • Our mechanism relies on a variant of the Misra-Gries algorithm
  • Sketches for neighboring datasets can differ for up to k keys
  • Algorithm 2 can be changed to handle elements with a count of zero by increasing the threshold

Tips for practitioners

  • Misra-Gries algorithm produces an associative array
  • Noise needs to be added to the array to ensure differential privacy
  • Order of keys in associative array can affect data structure
  • Laplace distribution used to sample noise
  • Precision-based attacks still exist
  • Geometric mechanism or alternatives can be used instead of Laplace
  • Threshold in Algorithm 2 might need to be changed
  • Analysis for Lemma 8 is not tight

Pure differential privacy

  • Discussing how to achieve differential privacy
  • Adding noise to all elements of U scaled to the ℓ 1 -sensitivity
  • Sensitivity of Misra-Gries sketches scales with the number of counters
  • Post-processing step reduces sensitivity to 2
  • Worst-case error guarantee is still /( + 1)