Differentially private stochastic gradient descent (DP-SGD) has become the standard framework for privacy-preserving machine learning, yet its reliance on a fixed gradient clipping threshold to limit sensitivity remains a significant practical limitation. Adaptive clipping algorithms such as AdaClip shift and scale the gradient prior to clipping and adding noise so that the clipped gradient yields a more informative descent direction. The shift and scaling parameters are selected adaptively based on the empirical mean and variance. However, in existing adaptive clipping algorithms, these empirical estimates have not been also used for momentum to accelerate training itself. On the other hand, DP-Adam is an algorithm that exploits Adam-like momentum updates based on the gradient mean and variance to accelerate training, but does not exploit these estimates for adaptive clipping. In this work, we propose Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum (DP-MacAdam), a novel algorithm that combines these two approaches so as to use the same mean and variance estimates for both clipping and momentum. We perform an analysis showing that DP-MacAdam estimates the gradient variances in a bias-free manner. In addition, we empirically evaluate the privacy and accuracy of DP-MacAdam, demonstrating that it achieves improved model utility compared to DP-SGD, AdaClip, and DP-Adam baselines, without requiring manual tuning of the clipping threshold.
翻译:差分隐私随机梯度下降(DP-SGD)已成为隐私保护机器学习中的标准框架,但其依赖固定梯度裁剪阈值来限制敏感度,这仍是一个重要的实际限制。自适应裁剪算法(如AdaClip)在裁剪和添加噪声之前对梯度进行平移和缩放,以使裁剪后的梯度能提供更具信息量的下降方向。平移和缩放参数根据经验均值和方差自适应选择。然而,现有的自适应裁剪算法中,这些经验估计并未同时用于动量来加速训练本身。另一方面,DP-Adam是一种利用基于梯度均值和方差的类Adam动量更新来加速训练的算法,但未利用这些估计进行自适应裁剪。本文提出了一种结合这两种方法的新型算法——自适应裁剪与自适应动量的差分隐私机制(DP-MacAdam),该算法对裁剪和动量使用相同的均值和方差估计。分析表明,DP-MacAdam以无偏方式估计梯度方差。此外,我们对DP-MacAdam的隐私性和准确性进行了实证评估,证明其在无需手动调整裁剪阈值的情况下,相比于DP-SGD、AdaClip和DP-Adam基线,实现了更高的模型效用。