Sharpness-Aware Minimization (SAM) has attracted significant attention for its effectiveness in improving generalization across various tasks. However, its underlying principles remain poorly understood. In this work, we analyze SAM's training dynamics using the maximum eigenvalue of the Hessian as a measure of sharpness, and propose a third-order stochastic differential equation (SDE), which reveals that the dynamics are driven by a complex mixture of second- and third-order terms. We show that alignment between the perturbation vector and the top eigenvector is crucial for SAM's effectiveness in regularizing sharpness, but find that this alignment is often inadequate in practice, limiting SAM's efficiency. Building on these insights, we introduce Eigen-SAM, an algorithm that explicitly aims to regularize the top Hessian eigenvalue by aligning the perturbation vector with the leading eigenvector. We validate the effectiveness of our theory and the practical advantages of our proposed approach through comprehensive experiments. Code is available at https://github.com/RitianLuo/EigenSAM.
翻译:锐度感知最小化(SAM)因其在提升多种任务泛化能力方面的有效性而受到广泛关注。然而,其内在原理仍不甚明晰。本研究通过使用海森矩阵的最大特征值作为锐度的度量,分析了SAM的训练动态,并提出了一个三阶随机微分方程(SDE)。该方程揭示了动态过程由二阶与三阶项的复杂混合所驱动。我们证明,扰动向量与顶部特征向量的对齐对于SAM正则化锐度的有效性至关重要,但发现实践中这种对齐往往不足,从而限制了SAM的效率。基于这些洞见,我们提出了Eigen-SAM算法,该算法通过将扰动向量与主导特征向量对齐,显式地旨在正则化顶部海森特征值。我们通过全面的实验验证了所提理论的有效性以及方法的实践优势。代码发布于 https://github.com/RitianLuo/EigenSAM。