Auditing Differentially Private Stochastic Gradient Descent (DP-SGD) in the final model setting is challenging and often results in empirical lower bounds that are significantly looser than theoretical privacy guarantees. We introduce a novel auditing method that achieves tighter empirical lower bounds without additional assumptions by crafting worst-case adversarial samples through loss-based input-space auditing. Our approach surpasses traditional canary-based heuristics and is effective in both white-box and black-box scenarios. Specifically, with a theoretical privacy budget of $\varepsilon = 10.0$, our method achieves empirical lower bounds of $6.68$ in white-box settings and $4.51$ in black-box settings, compared to the baseline of $4.11$ for MNIST. Moreover, we demonstrate that significant privacy auditing results can be achieved using in-distribution (ID) samples as canaries, obtaining an empirical lower bound of $4.33$ where traditional methods produce near-zero leakage detection. Our work offers a practical framework for reliable and accurate privacy auditing in differentially private machine learning.
翻译:在最终模型设置下审计差分隐私随机梯度下降(DP-SGD)具有挑战性,通常会导致经验下界显著宽松于理论隐私保证。我们提出一种新颖的审计方法,通过基于损失的输入空间审计构造最坏情况对抗样本,无需额外假设即可获得更严格的经验下界。该方法超越了传统的基于金丝雀样本的启发式方法,在白盒与黑盒场景中均表现有效。具体而言,在理论隐私预算为$\varepsilon = 10.0$时,我们的方法在白盒设置中获得$6.68$的经验下界,在黑盒设置中获得$4.51$,而MNIST数据集的基线结果为$4.11$。此外,我们证明了使用分布内(ID)样本作为金丝雀即可获得显著的隐私审计结果,在传统方法仅能检测到接近零泄漏的情况下,我们获得了$4.33$的经验下界。本工作为差分隐私机器学习提供了可靠且精确的隐私审计实用框架。