A Refreshment Stirred, Not Shaken: Invariant-Preserving Deployments of Differential Privacy for the U.S. Decennial Census

Protecting an individual's privacy when releasing their data is inherently an exercise in relativity, regardless of how privacy is qualified or quantified. This is because we can only limit the gain in information about an individual relative to what could be derived from other sources. This framing is the essence of differential privacy (DP), through which this article examines two statistical disclosure control (SDC) methods for the United States Decennial Census: the Permutation Swapping Algorithm (PSA), which resembles the 2010 Census's disclosure avoidance system (DAS), and the TopDown Algorithm (TDA), which was used in the 2020 DAS. To varying degrees, both methods leave unaltered certain statistics of the confidential data (their invariants) and hence neither can be readily reconciled with DP, at least as originally conceived. Nevertheless, we show how invariants can naturally be integrated into DP and use this to establish that the PSA satisfies pure DP subject to the invariants it necessarily induces, thereby proving that this traditional SDC method can, in fact, be understood from the perspective of DP. By a similar modification to zero-concentrated DP, we also provide a DP specification for the TDA. Finally, as a point of comparison, we consider a counterfactual scenario in which the PSA was adopted for the 2020 Census, resulting in a reduction in the nominal protection loss budget but at the cost of releasing many more invariants. This highlights the pervasive danger of comparing budgets without accounting for the other dimensions on which DP formulations vary (such as the invariants they permit). Therefore, while our results articulate the mathematical guarantees of SDC provided by the PSA, the TDA, and the 2020 DAS in general, care must be taken in translating these guarantees into actual privacy protection$\unicode{x2014}$just as is the case for any DP deployment.

翻译：在发布个体数据时保护其隐私本质上是一种相对性实践，无论隐私如何被定性或量化。这是因为我们只能限制相对于其他来源可推导信息而言的个体信息增益。这种框架正是差分隐私（DP）的核心思想，本文通过该框架审视美国十年人口普查中两种统计披露控制（SDC）方法：近似于2010年普查披露避免系统（DAS）的置换交换算法（PSA），以及应用于2020年DAS的TopDown算法（TDA）。这两种方法在不同程度上都保留了机密数据的某些统计量（其不变量），因此都无法与原始构想的DP直接兼容。尽管如此，我们展示了如何将不变量自然地整合到DP框架中，并以此证明PSA在必然产生的不变量约束下满足纯DP，从而证实这种传统SDC方法实际上可以从DP视角进行理解。通过对零集中差分隐私的类似修正，我们也为TDA提供了DP规范。最后，作为对比参照，我们考虑了一个假设场景：若2020年普查采用PSA方案，虽会降低名义保护损失预算，但代价是释放更多不变量。这突显了在比较预算时若不考虑DP公式其他变化维度（如允许的不变量）将普遍存在的风险。因此，尽管我们的研究结果明确了PSA、TDA及2020年DAS所提供的SDC数学保证，但在将这些保证转化为实际隐私保护时仍需审慎对待——正如任何DP部署所要求的那样。