This paper presents an auditing procedure for the Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm in the black-box threat model that is substantially tighter than prior work. The main intuition is to craft worst-case initial model parameters, as DP-SGD's privacy analysis is agnostic to the choice of the initial model parameters. For models trained on MNIST and CIFAR-10 at theoretical $\varepsilon=10.0$, our auditing procedure yields empirical estimates of $\varepsilon_{emp} = 7.21$ and $6.95$, respectively, on a 1,000-record sample and $\varepsilon_{emp}= 6.48$ and $4.96$ on the full datasets. By contrast, previous audits were only (relatively) tight in stronger white-box models, where the adversary can access the model's inner parameters and insert arbitrary gradients. Overall, our auditing procedure can offer valuable insight into how the privacy analysis of DP-SGD could be improved and detect bugs and DP violations in real-world implementations. The source code needed to reproduce our experiments is available at https://github.com/spalabucr/bb-audit-dpsgd.
翻译:本文提出了一种针对差分隐私随机梯度下降(DP-SGD)算法的黑盒威胁模型审计方法,其紧致性显著优于先前工作。其主要思路是构造最坏情况的初始模型参数,因为DP-SGD的隐私分析对初始模型参数的选择是不可知的。对于在理论隐私预算$\varepsilon=10.0$下于MNIST和CIFAR-10数据集上训练的模型,我们的审计方法在1,000条记录的样本上分别得到经验隐私预算估计值$\varepsilon_{emp} = 7.21$和$6.95$,在全数据集上分别得到$\varepsilon_{emp}= 6.48$和$4.96$。相比之下,以往的审计仅在更强的白盒模型(攻击者能够访问模型内部参数并插入任意梯度)中达到(相对)紧致性。总体而言,我们的审计方法能够为如何改进DP-SGD的隐私分析提供有价值的见解,并能在实际实现中检测错误和差分隐私违规行为。重现实验所需的源代码可在https://github.com/spalabucr/bb-audit-dpsgd获取。