The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the 1990, 2000, and 2010 Censuses. Our evaluation leverages the recent release of the Noisy Measure File (NMF) as well as the availability of two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful alone, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces similarly accurate data to swapping in terms of bias and noise. These patterns hold across census geographies with varying population sizes and racial diversity. While the estimated errors for both TopDown and swapping are generally no larger than other sources of Census error, they can be relatively substantial for geographies with small total populations.
翻译:美国人口普查局在普查统计数据的准确性与个人信息的保护之间面临艰难的权衡。我们首次独立评估了该局两种主要披露规避系统——用于2020年普查的TopDown算法与用于1990年、2000年及2010年普查的交换算法——所引入的偏差与噪声。我们的评估利用了近期发布的噪声测量文件(NMF)以及应用于2010年十年期普查的TopDown算法两次独立运行的可用性。研究发现,NMF包含过多噪声,难以直接独立使用,尤其对于西班牙裔及多种族人群而言。TopDown的后处理显著降低了NMF噪声,并在偏差与噪声方面产生了与交换算法同等精确的数据。这些模式在人口规模与种族多样性各异的普查地理区域中均保持一致。尽管TopDown与交换算法的估计误差总体上不超过其他来源的普查误差,但对于总人口规模较小的地理区域,其误差可能相对显著。