The U.S. Census Bureau's 2020 Disclosure Avoidance System (DAS) bases its output on noisy measurements, which are population tabulations added to realizations of mean-zero random variables. These noisy measurements are observed for a set of hierarchical geographic levels, e.g., the U.S. as a whole, states, counties, census tracts, and census blocks. The Census Bureau released the noisy measurements generated in the DAS executions for the two primary 2020 Census data products, in part to allow data users to assess uncertainty in 2020 Census tabulations introduced by disclosure avoidance. This paper describes an algorithm that can leverage the hierarchical structure of the input data in order to compute very high dimensional least squares estimates in a computationally efficient manner. Afterward, we show that this algorithm's output is equal to the generalized least squares estimator, describe how to find the variance of linear functions of this estimator, and provide a numerical experiment in which we compute confidence intervals of tabulations based on this estimator. We also describe an accompanying Census Bureau experimental data product that applies this estimator to the publicly available noisy measurements to provide data users with the inputs required to derive confidence intervals for all tabulations that were included in the 2020 Redistricting Data File, for the U.S., state, county, and census tract geographic levels.
翻译:美国人口普查局2020年披露规避系统(DAS)的输出基于噪声测量值,这些测量值是在均值零随机变量的实现基础上添加的人口统计表。这些噪声测量值针对一组分层地理层级进行观测,例如美国整体、州、县、人口普查区以及人口普查街区。人口普查局发布了2020年两次主要人口普查数据产品中DAS执行生成的噪声测量值,部分原因是为了让数据用户能够评估2020年人口普查表格中因披露规避引入的不确定性。本文描述了一种算法,该算法能够利用输入数据的分层结构,以计算高效的方式计算极高维度的最小二乘估计。随后,我们证明了该算法的输出等于广义最小二乘估计量,描述了如何找到该估计量线性函数的方差,并提供了一个数值实验,在该实验中我们基于此估计量计算了表格的置信区间。我们还介绍了一个配套的人口普查局实验数据产品,该产品将该估计量应用于公开可用的噪声测量值,为数据用户提供所需输入,以推导出2020年重划选区数据文件中所有表格(针对美国、州、县和人口普查区地理层级)的置信区间。