Differentially private mechanisms achieving worst-case optimal error bounds (e.g., the classical Laplace mechanism) are well-studied in the literature. However, when typical data are far from the worst case, \emph{instance-specific} error bounds -- which depend on the largest value in the dataset -- are more meaningful. For example, consider the sum estimation problem, where each user has an integer $x_i$ from the domain $\{0,1,\dots,U\}$ and we wish to estimate $\sum_i x_i$. This has a worst-case optimal error of $O(U/\varepsilon)$, while recent work has shown that the clipping mechanism can achieve an instance-optimal error of $O(\max_i x_i \cdot \log\log U /\varepsilon)$. Under the shuffle model, known instance-optimal protocols are less communication-efficient. The clipping mechanism also works in the shuffle model, but requires two rounds: Round one finds the clipping threshold, and round two does the clipping and computes the noisy sum of the clipped data. In this paper, we show how these two seemingly sequential steps can be done simultaneously in one round using just $1+o(1)$ messages per user, while maintaining the instance-optimal error bound. We also extend our technique to the high-dimensional sum estimation problem and sparse vector aggregation (a.k.a. frequency estimation under user-level differential privacy). Our experiments show order-of-magnitude improvements of our protocols in terms of error compared with prior work.
翻译:在文献中,实现最坏情况最优误差界限(例如经典拉普拉斯机制)的差分隐私机制已被充分研究。然而,当典型数据远离最坏情况时,依赖于数据集中最大值的事例特定误差界限更具意义。以总和估计问题为例,其中每个用户拥有定义域{0,1,…,U}中的整数x_i,目标是估计∑x_i。该问题的最坏情况最优误差为O(U/ε),而近期研究显示,裁剪机制可实现O(max x_i · log log U /ε)的实例最优误差。在混洗模型下,已知的实例最优协议通信效率较低。裁剪机制同样适用于混洗模型,但需要两轮交互:第一轮确定裁剪阈值,第二轮执行裁剪并计算裁剪后数据的带噪音总和。本文提出一种方法,将这两个看似顺序执行的步骤合并在单轮中完成,每个用户仅需1+o(1)条消息,同时保持实例最优误差界限。此外,我们将技术扩展到高维总和估计问题和稀疏向量聚合(即用户级差分隐私下的频率估计)。实验表明,与现有工作相比,我们的协议在误差方面实现了数量级改进。