Differential privacy is the de-facto privacy standard in data analysis. The classic model of differential privacy considers the data to be static. The dynamic setting, called differential privacy under continual observation, captures many applications more realistically. In this work we consider several natural dynamic data structure problems under continual observation, where we want to maintain information about a changing data set such that we can answer certain sets of queries at any given time while satisfying $\epsilon$-differential privacy. The problems we consider include (a) maintaining a histogram and various extensions of histogram queries such as quantile queries, (b) maintaining a predecessor search data structure of a dynamically changing set in a given ordered universe, and (c) maintaining the cardinality of a dynamically changing set. For (a) we give new error bounds parameterized in the maximum output of any query $c_{\max}$: our algorithm gives an upper bound of $O(d\log^2dc_{\max}+\log T)$ for computing histogram, the maximum and minimum column sum, quantiles on the column sums, and related queries. The bound holds for unknown $c_{\max}$ and $T$. For (b), we give a general reduction to orthogonal range counting. Further, we give an improvement for the case where only insertions are allowed. We get a data structure which for a given query, returns an interval that contains the predecessor, and at most $O(\log^2 u \sqrt{\log T})$ more elements, where $u$ is the size of the universe. The bound holds for unknown $T$. Lastly, for (c), we give a parameterized upper bound of $O(\min(d,\sqrt{K\log T}))$, where $K$ is an upper bound on the number of updates. We show a matching lower bound. Finally, we show how to extend the bound for (c) for unknown $K$ and $T$.
翻译:差分隐私是数据分析领域事实上的隐私保护标准。经典的差分隐私模型假设数据是静态的。然而,名为“持续观测下差分隐私”的动态设置能更真实地反映许多应用场景。本文研究持续观测下若干自然的动态数据结构问题,要求在不破坏$\epsilon$-差分隐私的前提下,动态维护变化数据集的信息,以便在任意时刻回答特定查询集合。我们关注的问题包括:(a) 维护直方图及其多种扩展(如分位数查询),(b) 在给定有序全集中维护动态变化集合的前驱搜索数据结构,(c) 维护动态变化集合的基数。针对问题(a),我们提出以任意查询最大输出值$c_{\max}$为参数的新误差界:算法在计算直方图、列和最大值与最小值、列和分位数及相关查询时,给出$O(d\log^2dc_{\max}+\log T)$的上界,该界适用于未知$c_{\max}$和$T$的情形。针对问题(b),我们提出将其归约为正交范围计数问题的一般性方法,并针对仅允许插入操作的特殊情况给出改进。所得数据结构对于给定查询,返回包含前驱的区间,且最多包含$O(\log^2 u \sqrt{\log T})$个多余元素($u$为全集中元素数量),该界适用于未知$T$。最后针对问题(c),我们给出参数化上界$O(\min(d,\sqrt{K\log T}))$($K$为更新次数上界),并证明匹配的下界。进一步,我们展示了如何将该界扩展到未知$K$和$T$的情形。