Gradient regularization, as described in \citet{barrett2021implicit}, is a highly effective technique for promoting flat minima during gradient descent. Empirical evidence suggests that this regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations, while also reducing test error. In this paper, we explore the per-example gradient regularization (PEGR) and present a theoretical analysis that demonstrates its effectiveness in improving both test error and robustness against noise perturbations. Specifically, we adopt a signal-noise data model from \citet{cao2022benign} and show that PEGR can learn signals effectively while suppressing noise. In contrast, standard gradient descent struggles to distinguish the signal from the noise, leading to suboptimal generalization performance. Our analysis reveals that PEGR penalizes the variance of pattern learning, thus effectively suppressing the memorization of noises from the training data. These findings underscore the importance of variance control in deep learning training and offer useful insights for developing more effective training approaches.
翻译:梯度正则化,如\citet{barrett2021implicit}所述,是一种在梯度下降过程中促进平坦极小值的高度有效技术。经验证据表明,这种正则化技术能够显著增强深度学习模型对噪声扰动的鲁棒性,同时降低测试误差。在本文中,我们探索了每个样本梯度正则化(PEGR),并提供了理论分析,证明其在改善测试误差和对噪声扰动鲁棒性方面的有效性。具体而言,我们采用了\citet{cao2022benign}中的信号-噪声数据模型,并表明PEGR能够有效学习信号同时抑制噪声。相比之下,标准梯度下降难以区分信号与噪声,导致泛化性能欠佳。我们的分析揭示,PEGR惩罚了模式学习的方差,从而有效抑制了对训练数据中噪声的记忆。这些发现强调了方差控制在深度学习训练中的重要性,并为开发更有效的训练方法提供了有益见解。