In this paper, we study the problem of publishing a stream of real-valued data satisfying differential privacy (DP). One major challenge is that the maximal possible value can be quite large; thus it is necessary to estimate a threshold so that numbers above it are truncated to reduce the amount of noise that is required to all the data. The estimation must be done based on the data in a private fashion. We develop such a method that uses the Exponential Mechanism with a quality function that approximates well the utility goal while maintaining a low sensitivity. Given the threshold, we then propose a novel online hierarchical method and several post-processing techniques. Building on these ideas, we formalize the steps into a framework for private publishing of stream data. Our framework consists of three components: a threshold optimizer that privately estimates the threshold, a perturber that adds calibrated noises to the stream, and a smoother that improves the result using post-processing. Within our framework, we design an algorithm satisfying the more stringent setting of DP called local DP (LDP). To our knowledge, this is the first LDP algorithm for publishing streaming data. Using four real-world datasets, we demonstrate that our mechanism outperforms the state-of-the-art by a factor of 6-10 orders of magnitude in terms of utility (measured by the mean squared error of answering a random range query).
翻译:本文研究了满足差分隐私(DP)的实值数据流发布问题。主要挑战在于数据可能的最大值会非常大,因此需要估计一个阈值,将超过该阈值的数值进行截断,以减少对所有数据所需注入的噪声量。该估计必须基于数据并以隐私保护的方式进行。我们开发了一种方法:利用指数机制,其质量函数在保持低敏感度的同时能很好地近似效用目标。基于该阈值,我们进一步提出了一种新颖的在线层次化方法及多种后处理技术。基于这些思想,我们将相关步骤形式化为一个用于数据流隐私发布的框架。该框架包含三个组件:一个隐私估计阈值的阈值优化器、一个向数据流注入校准噪声的扰动器,以及一个通过后处理提升结果质量的平滑器。在该框架内,我们设计了一种满足更严格的差分隐私设定——本地差分隐私(LDP)的算法。据我们所知,这是首个用于发布流数据的LDP算法。通过四个真实数据集上的实验,我们证明该机制在效用(以回答随机范围查询的均方误差衡量)上比现有最优方法提升了6-10个数量级。