Differentially Private Stochastic Gradient Descent (DPSGD) is widely utilized to preserve training data privacy in deep learning, which first clips the gradients to a predefined norm and then injects calibrated noise into the training procedure. Existing DPSGD works typically assume the gradients follow sub-Gaussian distributions and design various clipping mechanisms to optimize training performance. However, recent studies have shown that the gradients in deep learning exhibit a heavy-tail phenomenon, that is, the tails of the gradient have infinite variance, which may lead to excessive clipping loss to the gradients with existing DPSGD mechanisms. To address this problem, we propose a novel approach, Discriminative Clipping~(DC)-DPSGD, with two key designs. First, we introduce a subspace identification technique to distinguish between body and tail gradients. Second, we present a discriminative clipping mechanism that applies different clipping thresholds for body and tail gradients to reduce the clipping loss. Under the non-convex condition, \ourtech{} reduces the empirical gradient norm from {${\mathbb{O}\left(\log^{\max(0,\theta-1)}(T/\delta)\log^{2\theta}(\sqrt{T})\right)}$} to {${\mathbb{O}\left(\log(\sqrt{T})\right)}$} with heavy-tailed index $\theta\geq 1/2$, iterations $T$, and arbitrary probability $\delta$. Extensive experiments on four real-world datasets demonstrate that our approach outperforms three baselines by up to 9.72\% in terms of accuracy.
翻译:差分隐私随机梯度下降(DPSGD)被广泛应用于深度学习中以保护训练数据的隐私,其首先将梯度裁剪至预定义的范数,随后向训练过程中注入校准后的噪声。现有的DPSGD研究通常假设梯度服从亚高斯分布,并设计各种裁剪机制以优化训练性能。然而,近期研究表明深度学习中的梯度呈现重尾现象,即梯度的尾部具有无限方差,这可能导致现有DPSGD机制对梯度造成过度的裁剪损失。为解决此问题,我们提出一种新方法——判别式裁剪(DC)-DPSGD,其包含两项关键设计。首先,我们引入一种子空间识别技术以区分主体梯度与尾部梯度。其次,我们提出一种判别式裁剪机制,对主体与尾部梯度采用不同的裁剪阈值以减少裁剪损失。在非凸条件下,\ourtech{} 将经验梯度范数从 {${\mathbb{O}\left(\log^{\max(0,\theta-1)}(T/\delta)\log^{2\theta}(\sqrt{T})\right)}$} 降低至 {${\mathbb{O}\left(\log(\sqrt{T})\right)}$},其中重尾指数 $\theta\geq 1/2$,迭代次数为 $T$,且概率 $\delta$ 任意。在四个真实数据集上的大量实验表明,我们的方法在准确率方面最高可超越三种基线方法达 9.72\%。