The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the three previous Censuses. Our evaluation leverages the Noisy Measure File (NMF) as well as two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping. While the estimated errors for both TopDown and swapping algorithms are generally no greater than other sources of Census error, they can be relatively substantial for geographies with small total populations.
翻译:美国人口普查局在普查统计数据的准确性与个人信息的保护之间面临艰难的权衡。我们首次独立评估了该局采用的两种主要披露规避系统——2020年人口普查中使用的TopDown算法及此前三次普查中实施的交换算法——所引入的偏差与噪声。本研究利用了噪声测量文件(NMF)以及两次独立运行的TopDown算法应用于2010年十年期普查的结果。我们发现,NMF包含的噪声过大,难以直接实用,对西班牙裔及多种族人口的影响尤为显著。TopDown的后处理过程大幅降低了NMF的噪声,生成的数据精度与交换算法相仿。尽管两种算法的估计误差通常不大于普查的其他误差来源,但在总人口较少的地理区域中,这些误差可能相对显著。