Differentially Private Stochastic Gradient Descent (DP-SGD) is a popular iterative algorithm used to train machine learning models while formally guaranteeing the privacy of users. However the privacy analysis of DP-SGD makes the unrealistic assumption that all intermediate iterates (aka internal state) of the algorithm are released since in practice, only the final trained model, i.e., the final iterate of the algorithm is released. In this hidden state setting, prior work has provided tighter analyses, albeit only when the loss function is constrained, e.g., strongly convex and smooth or linear. On the other hand, the privacy leakage observed empirically from hidden state DP-SGD, even when using non-convex loss functions suggest that there is in fact a gap between the theoretical privacy analysis and the privacy guarantees achieved in practice. Therefore, it remains an open question whether privacy amplification for DP-SGD is possible in the hidden state setting for general loss functions. Unfortunately, this work answers the aforementioned research question negatively. By carefully constructing a loss function for DP-SGD, we show that for specific loss functions, the final iterate of DP-SGD alone leaks as much information as the sequence of all iterates combined. Furthermore, we empirically verify this result by evaluating the privacy leakage from the final iterate of DP-SGD with our loss function and show that this matches the theoretical upper bound guaranteed by DP exactly. Therefore, we show that the current privacy analysis fo DP-SGD is tight for general loss functions and conclude that no privacy amplification is possible for DP-SGD in general for all (possibly non-convex) loss functions.
翻译:差分隐私随机梯度下降(DP-SGD)是一种广泛使用的迭代算法,用于在训练机器学习模型时严格保障用户隐私。然而,DP-SGD的隐私分析基于一个不现实的假设:算法的所有中间迭代(即内部状态)均被公开,而实践中通常仅发布最终训练好的模型,即算法的最终迭代。在这种隐藏状态设定下,先前的研究已提供了更严格的分析,但仅限于损失函数受约束的情况,例如强凸且光滑或线性的损失函数。另一方面,即使在非凸损失函数下,从隐藏状态DP-SGD中经验观察到的隐私泄露表明,理论隐私分析与实践中实现的隐私保障之间确实存在差距。因此,对于一般损失函数,在隐藏状态设定下DP-SGD是否可能实现隐私放大,仍是一个悬而未决的问题。遗憾的是,本研究对上述研究问题给出了否定答案。通过精心构造DP-SGD的损失函数,我们证明对于特定损失函数,仅DP-SGD的最终迭代所泄露的信息量就与所有迭代序列的总和相当。此外,我们通过评估使用所构造损失函数的DP-SGD最终迭代的隐私泄露,经验验证了这一结果,并显示其与DP理论保证的上界完全吻合。因此,我们证明当前DP-SGD的隐私分析对于一般损失函数是紧致的,并得出结论:对于所有(可能非凸的)损失函数,DP-SGD在一般情况下无法实现隐私放大。