Differentially Private Data Structures under Continual Observation for Histograms and Related Queries

Binary counting under continual observation is a well-studied fundamental problem in differential privacy. A natural extension is maintaining column sums, also known as histogram, over a stream of rows from $\{0,1\}^d$, and answering queries about those sums, e.g. the maximum column sum or the median, while satisfying differential privacy. Jain et al. (2021) showed that computing the maximum column sum under continual observation while satisfying event-level differential privacy requires an error either polynomial in the dimension $d$ or the stream length $T$. On the other hand, no $o(d\log^2 T)$ upper bound for $\epsilon$-differential privacy or $o(\sqrt{d}\log^{3/2} T)$ upper bound for $(\epsilon,\delta)$-differential privacy are known. In this work, we give new parameterized upper bounds for maintaining histogram, maximum column sum, quantiles of the column sums, and any set of at most $d$ low-sensitivity, monotone, real valued queries on the column sums. Our solutions achieve an error of approximately $O(d\log^2 c_{\max}+\log T)$ for $\epsilon$-differential privacy and approximately $O(\sqrt{d}\log^{3/2}c_{\max}+\log T)$ for $(\epsilon,\delta)$-differential privacy, where $c_{\max}$ is the maximum value that the queries we want to answer can assume on the given data set. Furthermore, we show that such an improvement is not possible for a slightly expanded notion of neighboring streams by giving a lower bound of $\Omega(d \log T)$. This explains why our improvement cannot be achieved with the existing mechanisms for differentially private histograms, as they remain differentially private even for this expanded notion of neighboring streams.

翻译：持续观测下的二进制计数是差分隐私中一个研究充分的基础问题。其自然扩展是在满足差分隐私的条件下，维护来自{0,1}^d数据流上的列和（即直方图）并回答关于这些和的查询，例如最大列和或中位数。Jain等人（2021）表明，在持续观测下计算最大列和并满足事件级差分隐私需要维度d或数据流长度T的多项式级误差。另一方面，现有研究尚未给出ε-差分隐私的o(d log² T)上界或(ε,δ)-差分隐私的o(√d log^{3/2} T)上界。本文针对维护直方图、最大列和、列和的分位数，以及列和上至多d个低敏感度、单调、实值查询的任意集合，给出了新的参数化上界。我们的解对ε-差分隐私的误差约为O(d log² c_max + log T)，对(ε,δ)-差分隐私的误差约为O(√d log^{3/2} c_max + log T)，其中c_max是待回答查询在给定数据集上可能取到的最大值。此外，我们通过对稍扩展的相邻数据流概念给出Ω(d log T)下界，表明这种改进对于该扩展概念无法实现。这解释了为何我们的改进无法通过现有差分隐私直方图机制实现，因为这些机制即使对该扩展的相邻数据流概念仍保持差分隐私。