The U.S. Census Bureau collects and publishes detailed demographic data about Americans which are heavily used by researchers and policymakers. The Bureau has recently adopted the framework of differential privacy in an effort to improve confidentiality of individual census responses. A key output of this privacy protection system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics. The NMF is critical to understanding any errors introduced in the data, and performing valid statistical inference on published census data. Unfortunately, the current release format of the NMF is difficult to access and work with. We describe the process we use to transform the NMF into a usable format, and provide recommendations to the Bureau for how to release future versions of the NMF. These changes are essential for ensuring transparency of privacy measures and reproducibility of scientific research built on census data.
翻译:美国人口普查局收集并发布关于美国人的详细人口统计数据,这些数据被研究人员和政策制定者广泛使用。该局最近采用了差分隐私框架,以改善个人普查回应的机密性。这一隐私保护系统的关键输出是含噪声测量文件(NMF),该文件通过向制表统计数据添加随机噪声生成。NMF对于理解数据中引入的任何误差以及在发布的普查数据上进行有效的统计推断至关重要。遗憾的是,当前NMF的发布格式难以访问和使用。我们描述了将NMF转换为可用格式的过程,并向普查局提出建议,指导其如何发布未来版本的NMF。这些更改对于确保隐私措施的透明度以及基于普查数据的科学研究的可复现性至关重要。