Privacy auditing provides an important safeguard by estimating the actual information leaked by a model, thus ensuring that theoretical privacy guarantees hold in practice. We study empirical privacy auditing for differentially private (DP) machine learning, focusing on efficient one-run methods for mechanisms such as DP-SGD. Prior one-run approaches threshold training examples or "canaries" into binary membership guesses, which discards useful information. We show that, in the white-box DP-SGD setting, canary-aligned signals naturally form a sequence of random variables whose normalized sum is asymptotically Gaussian. Leveraging this distributional perspective, we develop a DP-auditing framework that leads to tighter privacy lower bounds from a single training run.
翻译:隐私审计通过估计模型实际泄露的信息提供了重要保障,从而确保理论上的隐私承诺在实践中得以实现。我们研究差分隐私(DP)机器学习中的经验性隐私审计,重点关注针对DP-SGD等机制的高效单轮方法。先前的单轮方法将训练样本或“金丝雀”阈值化为二元成员猜测,这丢弃了有用信息。我们证明,在白盒DP-SGD设置中,金丝雀对齐的信号自然形成一系列随机变量,其归一化和渐近服从高斯分布。利用这种分布视角,我们开发了一个DP审计框架,该框架能从单次训练运行中得到更紧的隐私下界。