Differential privacy (DP) offers a robust framework for safeguarding individual data privacy. To utilize DP in training modern machine learning models, differentially private optimizers have been widely used in recent years. A popular approach to privatize an optimizer is to clip the individual gradients and add sufficiently large noise to the clipped gradient. This approach led to the development of DP optimizers that have comparable performance with their non-private counterparts in fine-tuning tasks or in tasks with a small number of training parameters. However, a significant performance drop is observed when these optimizers are applied to large-scale training. This degradation stems from the substantial noise injection required to maintain DP, which disrupts the optimizer's dynamics. This paper introduces DiSK, a novel framework designed to significantly enhance the performance of DP optimizers. DiSK employs Kalman filtering, a technique drawn from control and signal processing, to effectively denoise privatized gradients and generate progressively refined gradient estimations. To ensure practicality for large-scale training, we simplify the Kalman filtering process, minimizing its memory and computational demands. We establish theoretical privacy-utility trade-off guarantees for DiSK, and demonstrate provable improvements over standard DP optimizers like DPSGD in terms of iteration complexity upper-bound. Extensive experiments across diverse tasks, including vision tasks such as CIFAR-100 and ImageNet-1k and language fine-tuning tasks such as GLUE, E2E, and DART, validate the effectiveness of DiSK. The results showcase its ability to significantly improve the performance of DP optimizers, surpassing state-of-the-art results under the same privacy constraints on several benchmarks.
翻译:差分隐私(DP)为保护个体数据隐私提供了一个鲁棒的框架。为了在现代机器学习模型训练中应用DP,差分隐私优化器近年来得到了广泛应用。一种常用的优化器隐私化方法是对个体梯度进行裁剪,并向裁剪后的梯度添加足够大的噪声。这一方法催生了多种DP优化器,这些优化器在微调任务或训练参数量较少的任务中,其性能已与非隐私优化器相当。然而,当这些优化器应用于大规模训练时,会观察到显著的性能下降。这种性能退化源于维持DP所需的大量噪声注入,它破坏了优化器的动态特性。本文提出了DiSK,一个旨在显著提升DP优化器性能的新型框架。DiSK采用源自控制与信号处理领域的卡尔曼滤波技术,对隐私化梯度进行有效降噪,并生成逐步精炼的梯度估计。为确保大规模训练的实用性,我们简化了卡尔曼滤波过程,最大限度地降低了其内存和计算需求。我们为DiSK建立了理论上的隐私-效用权衡保证,并在迭代复杂度上界方面证明了其相对于标准DP优化器(如DPSGD)的可证明改进。我们在多种任务上进行了广泛的实验,包括视觉任务(如CIFAR-100和ImageNet-1k)和语言微调任务(如GLUE、E2E和DART),验证了DiSK的有效性。结果表明,DiSK能够显著提升DP优化器的性能,在多个基准测试中,在相同隐私约束条件下超越了现有最佳结果。