The U.S. Census Bureau collects and publishes detailed demographic data about Americans which are heavily used by researchers and policymakers. The Bureau has recently adopted the framework of differential privacy in an effort to improve confidentiality of individual census responses. A key output of this privacy protection system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics. The NMF is critical to understanding any biases in the data, and performing valid statistical inference on published census data. Unfortunately, the current release format of the NMF is difficult to access and work with. We describe the process we use to transform the NMF into a usable format, and provide recommendations to the Bureau for how to release future versions of the NMF. These changes are essential for ensuring transparency of privacy measures and reproducibility of scientific research built on census data.
翻译:美国人口普查局收集并发布关于美国民众的详细人口统计数据,这些数据被研究人员和政策制定者广泛使用。为提升个人普查回应的保密性,该局近期采用了差分隐私框架。该隐私保护系统的关键输出是噪声测量文件(NMF),该文件通过向制表统计量添加随机噪声生成。NMF对于理解数据中的任何偏差以及对已发布普查数据进行有效统计推断至关重要。然而,当前NMF的发布格式难以访问和操作。我们描述了将NMF转换为可用格式的过程,并向普查局提出了关于如何发布未来版本NMF的建议。这些变更对于确保隐私措施的透明度以及基于普查数据构建的科学研究的可重复性至关重要。