This paper presents a nearly tight audit of the Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm in the black-box model. Our auditing procedure empirically estimates the privacy leakage from DP-SGD using membership inference attacks; unlike prior work, the estimates are appreciably close to the theoretical DP bounds. The main intuition is to craft worst-case initial model parameters, as DP-SGD's privacy analysis is agnostic to the choice of the initial model parameters. For models trained with theoretical $\varepsilon=10.0$ on MNIST and CIFAR-10, our auditing procedure yields empirical estimates of $7.21$ and $6.95$, respectively, on 1,000-record samples and $6.48$ and $4.96$ on the full datasets. By contrast, previous work achieved tight audits only in stronger (i.e., less realistic) white-box models that allow the adversary to access the model's inner parameters and insert arbitrary gradients. Our auditing procedure can be used to detect bugs and DP violations more easily and offers valuable insight into how the privacy analysis of DP-SGD can be further improved.
翻译:本文在**黑盒模型**下对差分隐私随机梯度下降(DP-SGD)算法进行了近乎紧致的审计。我们的审计程序通过成员推理攻击实证估计DP-SGD的隐私泄露;与先前工作不同,该估计值显著接近理论差分隐私界限。其主要思路是构造最坏情况的初始模型参数,因为DP-SGD的隐私分析不依赖于初始模型参数的选择。对于在MNIST和CIFAR-10数据集上以理论隐私预算$\varepsilon=10.0$训练的模型,我们的审计程序在1,000条记录的样本上分别得到$7.21$和$6.95$的实证估计值,在全数据集上分别得到$6.48$和$4.96$。相比之下,先前工作仅在更强的(即较不现实的)**白盒模型**中实现了紧致审计,该模型允许攻击者访问模型的内部参数并插入任意梯度。我们的审计程序可用于更轻松地检测漏洞和差分隐私违规行为,并为如何进一步改进DP-SGD的隐私分析提供了有价值的见解。