The U.S. Decennial Census serves as the foundation for many high-profile policy decision-making processes, including federal funding allocation and redistricting. In 2020, the Census Bureau adopted differential privacy to protect the confidentiality of individual responses through a disclosure avoidance system that injects noise into census data tabulations. The Bureau subsequently posed an open question: Could stronger privacy guarantees be obtained for the 2020 U.S. Census compared to their published guarantees, or equivalently, had the privacy budgets been fully utilized? In this paper, we address this question affirmatively by demonstrating that the 2020 U.S. Census provides significantly stronger privacy protections than its nominal guarantees suggest at each of the eight geographical levels, from the national level down to the block level. This finding is enabled by our precise tracking of privacy losses using $f$-differential privacy, applied to the composition of private queries across these geographical levels. Our analysis reveals that the Census Bureau introduced unnecessarily high levels of noise to meet the specified privacy guarantees for the 2020 Census. Consequently, we show that noise variances could be reduced by $15.08\%$ to $24.82\%$ while maintaining nearly the same level of privacy protection for each geographical level, thereby improving the accuracy of privatized census statistics. We empirically demonstrate that reducing noise injection into census statistics mitigates distortion caused by privacy constraints in downstream applications of private census data, illustrated through a study examining the relationship between earnings and education.
翻译:美国十年一度人口普查是许多重大政策决策过程的基础,包括联邦资金分配和选区重划。2020年,人口普查局采用差分隐私技术,通过一种向普查数据统计表中注入噪声的披露规避系统来保护个人回答的机密性。随后,该局提出了一个开放性问题:与已公布的隐私保障相比,2020年美国人口普查能否获得更强的隐私保护,或者等价地,隐私预算是否已被充分利用?在本文中,我们肯定地回答了这个问题,证明2020年美国人口普查在从国家层面到街区层面的八个地理级别上,都提供了比其名义保障所暗示的显著更强的隐私保护。这一发现得益于我们使用$f$-差分隐私对跨这些地理级别的私有查询组合进行精确的隐私损失追踪。我们的分析表明,人口普查局为了满足2020年普查指定的隐私保障,引入了不必要的高噪声水平。因此,我们证明,在保持每个地理级别几乎相同水平的隐私保护的同时,噪声方差可以降低$15.08\%$至$24.82\%$,从而提高了私有化普查统计数据的准确性。我们通过一项考察收入与教育关系的研究,实证展示了减少普查统计中的噪声注入会缓解下游应用中使用私有普查数据时由隐私约束引起的失真问题。