Differentially Private $L_2$-Heavy Hitters in the Sliding Window Model

The data management of large companies often prioritize more recent data, as a source of higher accuracy prediction than outdated data. For example, the Facebook data policy retains user search histories for $6$ months while the Google data retention policy states that browser information may be stored for up to $9$ months. These policies are captured by the sliding window model, in which only the most recent $W$ statistics form the underlying dataset. In this paper, we consider the problem of privately releasing the $L_2$-heavy hitters in the sliding window model, which include $L_p$-heavy hitters for $p\le 2$ and in some sense are the strongest possible guarantees that can be achieved using polylogarithmic space, but cannot be handled by existing techniques due to the sub-additivity of the $L_2$ norm. Moreover, existing non-private sliding window algorithms use the smooth histogram framework, which has high sensitivity. To overcome these barriers, we introduce the first differentially private algorithm for $L_2$-heavy hitters in the sliding window model by initiating a number of $L_2$-heavy hitter algorithms across the stream with significantly lower threshold. Similarly, we augment the algorithms with an approximate frequency tracking algorithm with significantly higher accuracy. We then use smooth sensitivity and statistical distance arguments to show that we can add noise proportional to an estimation of the $L_2$ norm. To the best of our knowledge, our techniques are the first to privately release statistics that are related to a sub-additive function in the sliding window model, and may be of independent interest to future differentially private algorithmic design in the sliding window model.

翻译：大型企业的数据管理往往优先考虑较新的数据，因为其能提供比过时数据更准确的预测。例如，Facebook的数据政策将用户搜索历史保留6个月，而Google的数据保留政策规定浏览器信息可存储长达9个月。这些政策通过滑动窗口模型得以体现，在该模型中，仅最近W个统计数据构成基础数据集。本文研究在滑动窗口模型中私有化释放$L_2$重击者的问题，这类重击者包括$p \leq 2$时的$L_p$重击者，且从某种意义上说，它们是使用多项对数空间所能实现的最强保证，但由于$L_2$范数的次可加性，现有技术无法处理。此外，现有的非私有滑动窗口算法使用平滑直方图框架，该框架具有高敏感性。为克服这些障碍，我们通过在整个数据流中启动多个阈值显著更低的$L_2$重击者算法，首次提出了滑动窗口模型中$L_2$重击者的差分隐私算法。类似地，我们为算法增强了近似频率跟踪功能，使其精度显著提高。随后，我们利用平滑敏感性和统计距离论证，证明可以添加与$L_2$范数估计成比例的噪声。据我们所知，我们的技术是首个在滑动窗口模型中私有化释放与次可加函数相关的统计量的方法，并可能对未来滑动窗口模型中的差分隐私算法设计具有独立参考价值。