The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the three previous Censuses. Our evaluation leverages the Noisy Measure File (NMF) as well as two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces data whose accuracy is similar to that of swapping. While the estimated errors for both TopDown and swapping algorithms are generally no greater than other sources of Census error, they can be relatively substantial for geographies with small total populations.
翻译:美国人口普查局在普查统计准确性与个人信息保护之间面临艰难权衡。我们首次独立评估了该局两种主要披露规避系统——用于2020年普查的TopDown算法及此前三次普查采用的交换算法——所引发的偏差与噪声。本评估利用噪声测量文件(NMF)及两次对2010年十年期普查独立运行的TopDown算法结果。研究发现:NMF包含的噪声过大,难以直接应用,尤其对西班牙裔及多种族群体而言;TopDown的后处理过程大幅降低了NMF噪声,所生成数据的准确性与交换算法相当。尽管两种算法的估计误差通常不高于普查其他误差源,但对于总人口较小的地理区域,其误差仍可能相对显著。