As the U.S. Census Bureau implements its controversial new disclosure avoidance system, researchers and policymakers debate the necessity of new privacy protections for public statistics. With experiments on both published statistics and synthetic data, we explore a particular privacy concern: respondents in subsidized housing may deliberately not mention unauthorized children and other household members for fear of being evicted. By combining public statistics from the Decennial Census and the Department of Housing and Urban Development, we demonstrate a simple, inexpensive reconstruction attack that could identify subsidized households living in violation of occupancy guidelines in 2010. Experiments on synthetic data suggest that a random swapping mechanism similar to the Census Bureau's 2010 disclosure avoidance measures does not significantly reduce the precision of this attack, while a differentially private mechanism similar to the 2020 disclosure avoidance system does. Our results provide a valuable example for policymakers seeking a trustworthy, accurate census.
翻译:随着美国人口普查局实施其颇具争议的新型披露规避系统,研究人员与政策制定者就公共统计数据是否需要新的隐私保护措施展开了辩论。通过对已发布统计数据与合成数据的实验,我们探究了一个特定的隐私关切:补贴住房中的受访者可能因担心被驱逐而故意隐瞒未经授权的儿童及其他家庭成员。通过整合十年一度人口普查与住房和城市发展部的公开统计数据,我们展示了一种简单且低成本的重构攻击,该攻击可识别2010年违反居住指南的补贴住户。在合成数据上的实验表明,类似于人口普查局2010年披露规避措施的随机交换机制并未显著降低此攻击的精确度,而类似于2020年披露规避系统的差分隐私机制则能有效降低攻击精度。我们的研究结果为寻求可信且准确人口普查的政策制定者提供了一个有价值的案例。