Privacy is a central challenge for systems that learn from sensitive data sets, especially when a system's outputs must be continuously updated to reflect changing data. We consider the achievable error for differentially private continual release of a basic statistic - the number of distinct items - in a stream where items may be both inserted and deleted (the turnstile model). With only insertions, existing algorithms have additive error just polylogarithmic in the length of the stream $T$. We uncover a much richer landscape in the turnstile model, even without considering memory restrictions. We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least $T^{1/4}$ even under a relatively weak, event-level privacy definition. Then, we identify a parameter of the input stream, its maximum flippancy, that is low for natural data streams and for which we give tight parameterized error guarantees. Specifically, the maximum flippancy is the largest number of times that the contribution of a single item to the distinct elements count changes over the course of the stream. We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy $w$, continually outputs the number of distinct elements with an $O(\sqrt{w} \cdot poly\log T)$ additive error, without requiring prior knowledge of $w$. We prove that this is the best achievable error bound that depends only on $w$, for a large range of values of $w$. When $w$ is small, the error of our mechanism is similar to the polylogarithmic in $T$ error in the insertion-only setting, bypassing the hardness in the turnstile model.
翻译:隐私是从敏感数据集中学习系统面临的核心挑战,尤其是在系统输出必须持续更新以反映数据变化时。我们研究了在允许项目插入和删除的流(旋转门模型)中,持续发布基本统计量——不同项目数量——时差分隐私机制可实现的误差界。在仅允许插入的场景下,现有算法的加性误差仅为流长度$T$的多对数级别。我们发现旋转门模型中的误差情况要复杂得多,即使不考虑内存限制也是如此。我们证明,每个处理插入和删除操作的差分隐私机制在最坏情况下至少具有$T^{1/4}$的加性误差,即使在相对较弱的事件级隐私定义下也是如此。接着,我们提出了输入流的一个参数——最大翻转度,该参数在自然数据流中通常较小,并为此给出了紧致的参数化误差保证。具体而言,最大翻转度是指单个项目对不同元素计数的贡献在流过程中发生改变的最大次数。我们提出了一种项目级差分隐私机制,对于所有最大翻转度为$w$的旋转门流,该机制能持续输出不同元素数量,其加性误差为$O(\sqrt{w} \cdot poly\log T)$,且无需预先知道$w$的值。我们证明,对于$w$的较大取值范围,这是仅依赖于$w$的最佳可达到误差界。当$w$较小时,我们机制的误差与纯插入场景下的$T$多对数误差相当,从而绕过了旋转门模型中的困难性。