Data from the Decennial Census is published only after applying a disclosure avoidance system (DAS). Data users were shaken by the adoption of differential privacy in the 2020 DAS, a radical departure from past methods. The goal of this paper is to better understand how the perturbations from the 2020 DAS combine with sharp legal thresholds to impact redistricting. We consider two redistricting settings in which a data user might be concerned about the impacts of privacy preserving noise: drawing equal population districts and litigating voting rights cases. What discrepancies arise if the user does nothing to account for disclosure avoidance? How can the discrepancies be understood and accounted for? We study these questions by comparing the official 2010 Redistricting Data to the 2010 Demonstration Data--created using the 2020 DAS--in an analysis of millions of algorithmically generated state legislative redistricting plans. We find that thresholding can amplify the impact of the noise from disclosure avoidance. Large discrepancies do occur, but in ways that are well-captured by simple models and appear to be possible to account for. We demonstrate the utility of these models by proposing an approach to mitigate discrepancies when balancing district populations. At least for state legislatures, Alabama's claim that differential privacy "inhibits a State's right to draw fair lines" lacks support.
翻译:十年一度的人口普查数据仅在使用披露避免系统(DAS)处理后才会发布。2020年DAS采用差分隐私技术,这一根本性转变令数据使用者感到震惊。本文旨在深入理解2020年DAS引入的扰动如何与严格的法律阈值相结合,进而影响选区重划过程。我们考察了数据使用者可能担忧隐私保护噪声影响的两个选区重划场景:绘制等人口选区与处理投票权诉讼案件。如果使用者未对披露避免机制进行任何校正,会产生何种偏差?这些偏差应如何理解与修正?通过比较官方2010年重划数据与采用2020年DAS生成的2010年演示数据,并分析数百万个算法生成的州立法机构重划方案,我们对这些问题展开研究。研究发现:阈值处理会放大披露避免机制所产生噪声的影响。虽然确实会出现较大偏差,但这些偏差可通过简单模型有效捕捉,且似乎能够进行校正。我们提出一种在平衡选区人口时减少偏差的方法,以此验证这些模型的实用性。至少对于州立法机构而言,阿拉巴马州关于差分隐私“限制州政府绘制公平选区边界权利”的主张缺乏依据。