Differentially private (DP) mechanisms have been deployed in a variety of high-impact social settings (perhaps most notably by the U.S. Census). Since all DP mechanisms involve adding noise to results of statistical queries, they are expected to impact our ability to accurately analyze and learn from data, in effect trading off privacy with utility. Alarmingly, the impact of DP on utility can vary significantly among different sub-populations. A simple way to reduce this disparity is with stratification. First compute an independent private estimate for each group in the data set (which may be the intersection of several protected classes), then, to compute estimates of global statistics, appropriately recombine these group estimates. Our main observation is that naive stratification often yields high-accuracy estimates of population-level statistics, without the need for additional privacy budget. We support this observation theoretically and empirically. Our theoretical results center on the private mean estimation problem, while our empirical results center on extensive experiments on private data synthesis to demonstrate the effectiveness of stratification on a variety of private mechanisms. Overall, we argue that this straightforward approach provides a strong baseline against which future work on reducing utility disparities of DP mechanisms should be compared.
翻译:差分隐私机制已被部署在多种具有重要社会影响的场景中(最著名的可能是美国人口普查)。由于所有差分隐私机制都需要对统计查询结果添加噪声,这预计会影响我们准确分析和学习数据的能力,实质上是在隐私与效用之间进行权衡。令人担忧的是,差分隐私对不同子群体的效用影响可能存在显著差异。一种简单的减少这种差异的方法是通过分层:首先为数据集中每个群体(可能是多个受保护类别的交集)独立计算私人估计值,然后通过适当重组这些群体估计值来计算全局统计量的估计量。我们的主要发现是:简单的分层通常能在不额外消耗隐私预算的情况下,获得高精度的人口层面统计估计量。我们从理论和实证两方面支持这一发现。理论结果聚焦于私人均值估计问题,实证结果则基于大量私有数据合成实验,展示了分层对各种隐私机制的有效性。总体而言,我们主张这种直接的方法为未来关于减少差分隐私机制效用差异的研究提供了强有力的基准参考。