Differentially-private stochastic gradient descent (DP-SGD) is a family of iterative machine learning training algorithms that privatize gradients to generate a sequence of differentially-private (DP) model parameters. It is also the standard tool used to train DP models in practice, even though most users are only interested in protecting the privacy of the final model. Tight DP accounting for the last iterate would minimize the amount of noise required while maintaining the same privacy guarantee and potentially increasing model utility. However, last-iterate accounting is challenging, and existing works require strong assumptions not satisfied by most implementations. These include assuming (i) the global sensitivity constant is known - to avoid gradient clipping; (ii) the loss function is Lipschitz or convex; and (iii) input batches are sampled randomly. In this work, we forego any unrealistic assumptions and provide privacy bounds for the most commonly used variant of DP-SGD, in which data is traversed cyclically, gradients are clipped, and only the last model is released. More specifically, we establish new Renyi differential privacy (RDP) upper bounds for the last iterate under realistic assumptions of small stepsize and Lipschitz smoothness of the loss function. Our general bounds also recover the special-case convex bounds when the weak-convexity parameter of the objective function approaches zero and no clipping is performed. The approach itself leverages optimal transport techniques for last iterate bounds, which is a nontrivial task when the data is traversed cyclically and the loss function is nonconvex.
翻译:差分隐私随机梯度下降(DP-SGD)是一类通过梯度私有化生成差分隐私模型参数序列的迭代式机器学习训练算法。尽管大多数用户仅关注最终模型的隐私保护,该算法仍是实践中训练差分隐私模型的标准工具。对最后迭代进行精确的差分隐私核算可在保持相同隐私保障的前提下最小化所需噪声量,并可能提升模型效用。然而,最后迭代的隐私核算具有挑战性,现有研究需要依赖大多数实际实现无法满足的强假设条件,包括:(i)已知全局敏感度常数(以避免梯度裁剪);(ii)损失函数满足利普希茨连续性或凸性;(iii)输入批次采用随机采样。本研究摒弃所有不切实际的假设,针对最常用的DP-SGD变体(采用循环数据遍历、梯度裁剪且仅发布最终模型)提供隐私边界。具体而言,我们在小步长和损失函数利普希茨光滑性的现实假设下,为最后迭代建立了新的Renyi差分隐私(RDP)上界。当目标函数的弱凸性参数趋近于零且不进行梯度裁剪时,我们的通用边界可退化为凸函数特例的边界。本方法本身利用了最优传输技术推导最后迭代边界,这在数据循环遍历且损失函数非凸的情形下是一项非平凡的任务。