Sampling is renowned for its privacy amplification in differential privacy (DP), and is often assumed to improve the utility of a DP mechanism by allowing a noise reduction. In this paper, we further show that this last assumption is flawed: When measuring utility at equal privacy levels, sampling as preprocessing consistently yields penalties due to utility loss from omitting records over all canonical DP mechanisms -- Laplace, Gaussian, exponential, and report noisy max -- , as well as recent applications of sampling, such as clustering. Extending this analysis, we investigate suppression as a generalized method of choosing, or omitting, records. Developing a theoretical analysis of this technique, we derive privacy bounds for arbitrary suppression strategies under unbounded approximate DP. We find that our tested suppression strategy also fails to improve the privacy--utility tradeoff. Surprisingly, uniform sampling emerges as one of the best suppression methods -- despite its still degrading effect. Our results call into question common preprocessing assumptions in DP practice.
翻译:采样在差分隐私(DP)中以其隐私放大效应而著称,并且通常被认为能够通过降低噪声来改善DP机制的效用。本文进一步指出后一假设存在缺陷:在相同隐私水平下衡量效用时,作为预处理步骤的采样在所有经典DP机制(拉普拉斯、高斯、指数机制及报告噪声最大值机制)以及采样的最新应用(如聚类)中,均会因记录省略导致效用损失而产生负面效应。扩展此分析,我们将抑制视为选择或省略记录的广义方法。通过建立该技术的理论分析框架,我们推导了无界近似差分隐私下任意抑制策略的隐私界限。研究发现,我们测试的抑制策略同样未能改善隐私-效用权衡。值得注意的是,均匀采样成为最优抑制方法之一——尽管其仍存在效用降低效应。我们的研究结果对DP实践中常见的预处理假设提出了质疑。