Differentially private stochastic gradient descent (DP-SGD) has been widely adopted in deep learning to provide rigorously defined privacy, which requires gradient clipping to bound the maximum norm of individual gradients and additive isotropic Gaussian noise. With analysis of the convergence rate of DP-SGD in a non-convex setting, we identify that randomly sparsifying gradients before clipping and noisification adjusts a trade-off between internal components of the convergence bound and leads to a smaller upper bound when the noise is dominant. Additionally, our theoretical analysis and empirical evaluations show that the trade-off is not trivial but possibly a unique property of DP-SGD, as either canceling noisification or gradient clipping eliminates the trade-off in the bound. This observation is indicative, as it implies DP-SGD has special inherent room for (even simply random) gradient compression. To verify the observation and utilize it, we propose an efficient and lightweight extension using random sparsification (RS) to strengthen DP-SGD. Experiments with various DP-SGD frameworks show that RS can improve performance. Additionally, the produced sparse gradients of RS exhibit advantages in reducing communication cost and strengthening privacy against reconstruction attacks, which are also key problems in private machine learning.
翻译:差分隐私随机梯度下降(DP-SGD)已在深度学习中广泛采用以提供严格定义的隐私保护,该方法需要梯度裁剪以限制单个梯度的最大范数,并添加各向同性高斯噪声。通过对非凸场景下DP-SGD收敛率的分析,我们发现,在裁剪和加噪之前随机稀疏化梯度能够调整收敛界内部组件之间的权衡,并在噪声占主导时产生更小的上界。此外,我们的理论分析和实验评估表明,这种权衡并非微不足道,而可能是DP-SGD的独特属性,因为取消加噪或梯度裁剪都会消除该界中的权衡。这一观察具有指示意义,因为它表明DP-SGD存在特殊的固有空间用于梯度压缩(即使是简单的随机压缩)。为验证这一观察并加以利用,我们提出了一种利用随机稀疏化(RS)的高效轻量级扩展来增强DP-SGD。基于多种DP-SGD框架的实验表明,RS能够提升性能。此外,RS产生的稀疏梯度在降低通信成本和增强对重建攻击的隐私防御方面展现出优势,这两者也是隐私机器学习中的关键问题。