In privacy under continual observation we study how to release differentially private estimates based on a dataset that evolves over time. The problem of releasing private prefix sums of $x_1,x_2,x_3,\dots \in\{0,1\}$ (where the value of each $x_i$ is to be private) is particularly well-studied, and a generalized form is used in state-of-the-art methods for private stochastic gradient descent (SGD). The seminal binary mechanism privately releases the first $t$ prefix sums with noise of variance polylogarithmic in $t$. Recently, Henzinger et al. and Denisov et al. showed that it is possible to improve on the binary mechanism in two ways: The variance of the noise can be reduced by a (large) constant factor, and also made more even across time steps. However, their algorithms for generating the noise distribution are not as efficient as one would like in terms of computation time and (in particular) space. We address the efficiency problem by presenting a simple alternative to the binary mechanism in which 1) generating the noise takes constant average time per value, 2) the variance is reduced by a factor about 4 compared to the binary mechanism, and 3) the noise distribution at each step is identical. Empirically, a simple Python implementation of our approach outperforms the running time of the approach of Henzinger et al., as well as an attempt to improve their algorithm using high-performance algorithms for multiplication with Toeplitz matrices.
翻译:在连续观测下的隐私保护中,我们研究如何基于随时间演变的数据集发布满足差分隐私的估计结果。针对$x_1,x_2,x_3,\dots \in\{0,1\}$(其中每个$x_i$的值需保持隐私)的私有前缀和发布问题已被深入研究,其广义形式被用于最先进的隐私随机梯度下降方法中。经典的二进制机制以方差与$t$的多对数函数成比例的噪声,私有地发布前$t$个前缀和。近期,Henzinger等人与Denisov等人表明可通过两种方式改进二进制机制:噪声方差可降低(较大的)常数因子,同时使噪声在不同时间步长上分布更均匀。然而,他们生成噪声分布的算法在计算时间(尤其是空间复杂度)上未达到理想效率。我们通过提出二进制机制的简单替代方案来解决效率问题:1)每个值的噪声生成平均时间为常数;2)方差较原始二进制机制降低约4倍;3)每个时间步长的噪声分布完全相同。实验表明,我们方法的简单Python实现不仅在运行时间上优于Henzinger等人的方法,还胜过使用Toeplitz矩阵乘法高性能算法改进其算法的尝试。