We present a rigorous methodology for auditing differentially private machine learning algorithms by adding multiple carefully designed examples called canaries. We take a first principles approach based on three key components. First, we introduce Lifted Differential Privacy (LiDP) that expands the definition of differential privacy to handle randomized datasets. This gives us the freedom to design randomized canaries. Second, we audit LiDP by trying to distinguish between the model trained with $K$ canaries versus $K - 1$ canaries in the dataset, leaving one canary out. By drawing the canaries i.i.d., LiDP can leverage the symmetry in the design and reuse each privately trained model to run multiple statistical tests, one for each canary. Third, we introduce novel confidence intervals that take advantage of the multiple test statistics by adapting to the empirical higher-order correlations. Together, this new recipe demonstrates significant improvements in sample complexity, both theoretically and empirically, using synthetic and real data. Further, recent advances in designing stronger canaries can be readily incorporated into the new framework.
翻译:我们提出了一种严格的审计差分隐私机器学习算法的方法,通过添加多个精心设计的“金丝雀”示例。我们基于三个关键组件采用第一性原理的方法。首先,我们引入了提升差分隐私(LiDP),它将差分隐私的定义扩展以处理随机化数据集,从而赋予我们设计随机化金丝雀的自由。其次,我们通过尝试区分训练数据集中包含$K$个金丝雀的模型与包含$K-1$个金丝雀(即遗漏一个金丝雀)的模型来审计LiDP。通过独立同分布地抽取金丝雀,LiDP可以利用设计中的对称性,并重用每个私有训练模型来运行多个统计检验(每个金丝雀对应一个检验)。第三,我们引入了新颖的置信区间,通过适应经验高阶相关性来利用多个检验统计量。这一新方案在理论和实证上(使用合成数据和真实数据)均显著改善了样本复杂度。此外,近期在设计更强金丝雀方面的进展可以轻松融入这一新框架。