Machine learning's reliance on sensitive data necessitates privacy-preserving techniques like Differentially Private Stochastic Gradient Descent (DPSGD). However, DPSGD suffers from substantial utility degradation and slow convergence due to gradient clipping and noise injection. Prior works have attempted to improve DPSGD from various perspectives; notably, the Differentially Private Selective Update and Release (DPSUR) algorithm has achieved remarkable model utility. However, the privacy accounting in DPSUR overlooks the variation in sampling probability introduced by the selective release mechanism, which compromises the rigor of its privacy guarantees. To address these limitations, we re-evaluate the privacy analysis of the selective release mechanism and propose a novel algorithm: Differentially Private Selective Release based on Clipped Gradients (DPSR-CG). Through a rigorous, newly derived privacy analysis and extensive experiments on multiple datasets (MNIST, CIFAR-10, IMDB, and FMNIST), we demonstrate that our DPSR-CG mechanism maintains strict privacy guarantees while achieving exceptional model performance.
翻译:机器学习对敏感数据的依赖使差分隐私随机梯度下降(DPSGD)等隐私保护技术成为必要。然而,DPSGD因梯度裁剪和噪声注入而面临严重的效用退化与收敛缓慢问题。现有研究已从多个角度尝试改进DPSGD;值得注意的是,差分隐私选择性更新与发布(DPSUR)算法取得了显著的模型效用。但DPSUR中的隐私核算忽略了选择性发布机制引入的采样概率变化,这削弱了其隐私保证的严谨性。为解决这些局限,我们重新评估了选择性发布机制的隐私分析,并提出一种新算法:基于裁剪梯度的差分隐私选择性发布(DPSR-CG)。通过严格推导的全新隐私分析以及在多个数据集(MNIST、CIFAR-10、IMDB与FMNIST)上的广泛实验,我们证明DPSR-CG机制在保持严格隐私保证的同时实现了卓越的模型性能。