The concept of differential privacy (DP) has gained substantial attention in recent years, most notably since the U.S. Census Bureau announced the adoption of the concept for its 2020 Decennial Census. However, despite its attractive theoretical properties, implementing DP in practice remains challenging, especially when it comes to survey data. In this paper we present some results from an ongoing project funded by the U.S. Census Bureau that is exploring the possibilities and limitations of DP for survey data. Specifically, we identify five aspects that need to be considered when adopting DP in the survey context: the multi-staged nature of data production; the limited privacy amplification from complex sampling designs; the implications of survey-weighted estimates; the weighting adjustments for nonresponse and other data deficiencies, and the imputation of missing values. We summarize the project's key findings with respect to each of these aspects and also discuss some of the challenges that still need to be addressed before DP could become the new data protection standard at statistical agencies.
翻译:差分隐私(DP)这一概念近年来受到广泛关注,尤其在美国人口普查局宣布将其应用于2020年十年期人口普查之后。然而,尽管差分隐私具有吸引人的理论特性,其在实际应用——特别是针对调查数据时——仍面临诸多挑战。本文展示了由美国人口普查局资助的持续研究项目所取得的部分成果,该项目旨在探索差分隐私在调查数据应用中的可能性与局限性。具体而言,我们识别了在调查场景中采用差分隐私时需要考虑的五个关键维度:数据生产的多阶段特性;复杂抽样设计带来的有限隐私放大效应;调查加权估计量的影响;针对无应答及其他数据缺陷的加权调整方法;以及缺失值插补问题。我们总结了项目在这些方面的核心发现,并讨论了在差分隐私成为统计机构数据保护新标准之前仍需解决的部分挑战。