Training deep learning models with differential privacy (DP) results in a degradation of performance. The training dynamics of models with DP show a significant difference from standard training, whereas understanding the geometric properties of private learning remains largely unexplored. In this paper, we investigate sharpness, a key factor in achieving better generalization, in private learning. We show that flat minima can help reduce the negative effects of per-example gradient clipping and the addition of Gaussian noise. We then verify the effectiveness of Sharpness-Aware Minimization (SAM) for seeking flat minima in private learning. However, we also discover that SAM is detrimental to the privacy budget and computational time due to its two-step optimization. Thus, we propose a new sharpness-aware training method that mitigates the privacy-optimization trade-off. Our experimental results demonstrate that the proposed method improves the performance of deep learning models with DP from both scratch and fine-tuning. Code is available at https://github.com/jinseongP/DPSAT.
翻译:使用差分隐私(DP)训练深度学习模型会导致性能下降。采用DP的模型训练动态与标准训练存在显著差异,然而对私有学习几何特性的理解仍鲜有探究。本文研究了私有学习中影响泛化能力的关键因素——锐度。我们证明平坦极小值有助于减少逐样本梯度裁剪和高斯噪声带来的负面影响,进而验证了锐度感知最小化(SAM)在私有学习中寻找平坦极小值的有效性。但我们也发现SAM因两步优化会消耗隐私预算并增加计算时间。为此,我们提出一种新型锐度感知训练方法,以缓解隐私与优化之间的权衡。实验结果表明,所提方法能从基础训练和微调两个层面提升采用DP的深度学习模型性能。代码已开源至https://github.com/jinseongP/DPSAT。