Sharpness-Aware Minimization (SAM) has emerged as a powerful method for improving generalization in machine learning models by minimizing the sharpness of the loss landscape. However, despite its success, several important questions regarding the convergence properties of SAM in non-convex settings are still open, including the benefits of using normalization in the update rule, the dependence of the analysis on the restrictive bounded variance assumption, and the convergence guarantees under different sampling strategies. To address these questions, in this paper, we provide a unified analysis of SAM and its unnormalized variant (USAM) under one single flexible update rule (Unified SAM), and we present convergence results of the new algorithm under a relaxed and more natural assumption on the stochastic noise. Our analysis provides convergence guarantees for SAM under different step size selections for non-convex problems and functions that satisfy the Polyak-Lojasiewicz (PL) condition (a non-convex generalization of strongly convex functions). The proposed theory holds under the arbitrary sampling paradigm, which includes importance sampling as special case, allowing us to analyze variants of SAM that were never explicitly considered in the literature. Experiments validate the theoretical findings and further demonstrate the practical effectiveness of Unified SAM in training deep neural networks for image classification tasks.
翻译:锐度感知最小化(SAM)已成为一种通过最小化损失景观的锐度来提升机器学习模型泛化能力的有效方法。然而,尽管其取得了成功,关于SAM在非凸设置下的收敛性质仍存在若干重要问题尚未解决,包括在更新规则中使用归一化的益处、分析对限制性有界方差假设的依赖性,以及在不同采样策略下的收敛保证。为应对这些问题,本文在单一灵活的更新规则(统一SAM)下,对SAM及其非归一化变体(USAM)提供了统一分析,并在对随机噪声更宽松且更自然的假设下,给出了新算法的收敛结果。我们的分析为SAM在非凸问题及满足Polyak-Lojasiewicz(PL)条件(强凸函数的非凸推广)的函数上,针对不同步长选择提供了收敛保证。所提出的理论在任意采样范式下成立,该范式以重要性采样为特例,从而使得我们能够分析文献中从未明确考虑过的SAM变体。实验验证了理论发现,并进一步证明了统一SAM在训练用于图像分类任务的深度神经网络中的实际有效性。