Sharpness-Aware Minimization (SAM) aims to improve generalization by minimizing a worst-case perturbed loss over a small neighborhood of model parameters. However, during training, its optimization behavior does not always align with theoretical expectations, since both sharp and flat regions may yield a small perturbed loss. In such cases, the gradient may still point toward sharp regions, failing to achieve the intended effect of SAM. To address this issue, we investigate SAM from a spectral and geometric perspective: specifically, we utilize the angle between the gradient and the leading eigenvector of the Hessian as a measure of sharpness. Our analysis illustrates that when this angle is less than or equal to ninety degrees, the effect of SAM's sharpness regularization can be weakened. Furthermore, we propose an explicit eigenvector-aligned SAM (X-SAM), which corrects the gradient via orthogonal decomposition along the top eigenvector, enabling more direct and efficient regularization of the Hessian's maximum eigenvalue. We prove X-SAM's convergence and superior generalization, with extensive experimental evaluations confirming both theoretical and practical advantages.
翻译:锐度感知最小化(SAM)旨在通过最小化模型参数小邻域内的最坏情况扰动损失来提升泛化能力。然而,在训练过程中,其优化行为并不总是与理论预期一致,因为尖锐和平坦的区域都可能产生较小的扰动损失。在这种情况下,梯度仍可能指向尖锐区域,从而无法实现SAM的预期效果。为解决此问题,我们从谱和几何视角研究SAM:具体而言,我们利用梯度与Hessian矩阵主导特征向量之间的夹角作为锐度的度量。我们的分析表明,当该夹角小于或等于九十度时,SAM的锐度正则化效果会被削弱。此外,我们提出了一种显式的特征向量对齐SAM(X-SAM),它通过沿顶部特征向量进行正交分解来校正梯度,从而能够更直接且高效地对Hessian矩阵的最大特征值进行正则化。我们证明了X-SAM的收敛性及更优的泛化性能,并通过大量实验评估验证了其理论和实践优势。