Differentially private (DP) training preserves the data privacy usually at the cost of slower convergence (and thus lower accuracy), as well as more severe mis-calibration than its non-private counterpart. To analyze the convergence of DP training, we formulate a continuous time analysis through the lens of neural tangent kernel (NTK), which characterizes the per-sample gradient clipping and the noise addition in DP training, for arbitrary network architectures and loss functions. Interestingly, we show that the noise addition only affects the privacy risk but not the convergence or calibration, whereas the per-sample gradient clipping (under both flat and layerwise clipping styles) only affects the convergence and calibration. Furthermore, we observe that while DP models trained with small clipping norm usually achieve the best accurate, but are poorly calibrated and thus unreliable. In sharp contrast, DP models trained with large clipping norm enjoy the same privacy guarantee and similar accuracy, but are significantly more \textit{calibrated}. Our code can be found at \url{https://github.com/woodyx218/opacus_global_clipping}.
翻译:差分隐私训练通常以更慢的收敛速度(进而导致更低的准确率)及比非隐私对应方法更严重的校准偏差为代价来保护数据隐私。为分析差分隐私训练的收敛性,我们通过神经正切核视角建立连续时间分析框架,该框架能刻画任意网络架构与损失函数下差分隐私训练中的逐样本梯度裁剪与噪声添加机制。有趣的是,我们证明噪声添加仅影响隐私风险而不影响收敛性或校准性能,而逐样本梯度裁剪(包括平坦裁剪与层级裁剪两种风格)仅影响收敛性与校准性能。此外,我们观察到:采用小裁剪范数训练的差分隐私模型通常能获得最佳准确率,但校准效果较差且因此不可靠;与之形成鲜明对比的是,采用大裁剪范数训练的差分隐私模型在享有相同隐私保障与相似准确率的同时,其校准性能显著更优。我们的代码可在\url{https://github.com/woodyx218/opacus_global_clipping}获取。