Differentially private stochastic gradient descent (DP-SGD) refers to a family of optimization algorithms that provide a guaranteed level of differential privacy (DP) through DP accounting techniques. However, current accounting techniques make assumptions that diverge significantly from practical DP-SGD implementations. For example, they may assume the loss function is Lipschitz continuous and convex, sample the batches randomly with replacement, or omit the gradient clipping step. In this work, we analyze the most commonly used variant of DP-SGD, in which we sample batches cyclically with replacement, perform gradient clipping, and only release the last DP-SGD iterate. More specifically - without assuming convexity, smoothness, or Lipschitz continuity of the loss function - we establish new R\'enyi differential privacy (RDP) bounds for the last DP-SGD iterate under the mild assumption that (i) the DP-SGD stepsize is small relative to the topological constants in the loss function, and (ii) the loss function is weakly-convex. Moreover, we show that our bounds converge to previously established convex bounds when the weak-convexity parameter of the objective function approaches zero. In the case of non-Lipschitz smooth loss functions, we provide a weaker bound that scales well in terms of the number of DP-SGD iterations.
翻译:差分隐私随机梯度下降(DP-SGD)是指通过差分隐私核算技术提供可保证隐私水平的优化算法族。然而,当前核算技术的假设与实际DP-SGD实现存在显著差异。例如,这些假设可能要求损失函数满足Lipschitz连续性与凸性、采用随机有放回批次采样,或忽略梯度裁剪步骤。本研究分析了最常用的DP-SGD变体:该变体采用循环有放回批次采样,执行梯度裁剪,且仅发布最终DP-SGD迭代结果。具体而言——在不假设损失函数具有凸性、光滑性或Lipschitz连续性的前提下——我们在以下温和假设下为最终DP-SGD迭代建立了新的Rényi差分隐私(RDP)边界:(i)DP-SGD步长相对于损失函数的拓扑常数较小;(ii)损失函数具有弱凸性。此外,我们证明当目标函数的弱凸性参数趋近于零时,所得边界会收敛至先前建立的凸函数边界。对于非Lipschitz光滑损失函数,我们给出了随DP-SGD迭代次数具有良好缩放性的较弱边界。