Differentially Private Stochastic Gradient Descent (DP-SGD) is a cornerstone technique for ensuring privacy in deep learning, widely used in both training from scratch and fine-tuning large-scale language models. While DP-SGD predominantly relies on the Gaussian mechanism, the Laplace mechanism remains underutilized due to its reliance on L1 norm clipping. This constraint severely limits its practicality in high-dimensional models because the L1 norm of an n-dimensional gradient can be up to sqrt(n) times larger than its L2 norm. As a result, the required noise scale grows significantly with model size, leading to poor utility or untrainable models. In this work, we introduce Lap2, a new solution that enables L2 clipping for Laplace DP-SGD while preserving strong privacy guarantees. We overcome the dimensionality-driven clipping barrier by computing coordinate-wise moment bounds and applying majorization theory to construct a tight, data-independent upper bound over the full model. By exploiting the Schur-convexity of the moment accountant function, we aggregate these bounds using a carefully designed majorization set that respects the L2 clipping constraint. This yields a multivariate privacy accountant that scales gracefully with model dimension and enables the use of thousands of moments. Empirical evaluations demonstrate that our approach significantly improves the performance of Laplace DP-SGD, achieving results comparable to or better than Gaussian DP-SGD under strong privacy constraints. For instance, fine-tuning RoBERTa-base (125M parameters) on SST-2 achieves 87.88% accuracy at epsilon=0.54, outperforming Gaussian (87.16%) and standard Laplace (48.97%) under the same budget.
翻译:差分隐私随机梯度下降(DP-SGD)是深度学习隐私保护的核心技术,广泛用于从头训练和微调大规模语言模型。尽管DP-SGD主要依赖高斯机制,拉普拉斯机制却因依赖于L1范数裁剪而未被充分利用。这一限制在高维模型中严重制约其实用性,因为n维梯度的L1范数可能比其L2范数大至sqrt(n)倍。这导致所需噪声尺度随模型规模显著增长,造成效用低下或模型无法训练。本文提出Lap2,一种支持L2裁剪的拉普拉斯DP-SGD新方案,同时保持强隐私保障。我们通过计算逐坐标矩边界并应用优控理论,构建了覆盖完整模型的紧致数据无关上界,从而克服了维度驱动的裁剪障碍。利用矩会计函数的Schur凸性,我们通过精心设计的符合L2裁剪约束的优控集聚合这些边界,得到了一种随模型维度优雅扩展且支持数千阶矩计算的多变量隐私会计方法。实验评估表明,该方法显著提升了拉普拉斯DP-SGD的性能,在强隐私约束下取得了与高斯DP-SGD相当或更优的结果。例如,在SST-2数据集上微调RoBERTa-base(1.25亿参数)时,在ε=0.54的隐私预算下达到87.88%的准确率,优于同等条件下高斯机制(87.16%)和标准拉普拉斯机制(48.97%)的表现。