Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. A fundamental result in differential privacy is that a differentially private mechanism run on a simple random sample of a population provides stronger privacy guarantees than the same algorithm run on the entire population. However, in practice, sampling designs are often more complex than the simple, data-independent sampling schemes that are addressed in prior work. In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes. We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation. We analyze the privacy implications of the pervasive cluster sampling and stratified sampling paradigms, as well as provide some insight into the study of more general sampling designs.
翻译:抽样方案是统计学、调查设计和算法设计中的基础工具。差分隐私中的一个基本结论是:对简单随机样本执行的差分隐私机制,比在整个总体中执行同一算法能提供更强的隐私保证。然而,在实际应用中,抽样设计往往比先前研究中处理的简单、数据无关的抽样方案更为复杂。在本工作中,我们将隐私放大效应的研究扩展至更复杂、数据相关的抽样方案。我们发现,这些抽样方案不仅常常无法放大隐私,反而可能导致隐私性能下降。我们分析了广泛使用的整群抽样和分层抽样范式的隐私影响,并为更一般性抽样设计的研究提供了见解。