Sharpness-aware minimization (SAM), which searches for flat minima by min-max optimization, has been shown to be useful in improving model generalization. However, since each SAM update requires computing two gradients, its computational cost and training time are both doubled compared to standard empirical risk minimization (ERM). Recent state-of-the-arts reduce the fraction of SAM updates and thus accelerate SAM by switching between SAM and ERM updates randomly or periodically. In this paper, we design an adaptive policy to employ SAM based on the loss landscape geometry. Two efficient algorithms, AE-SAM and AE-LookSAM, are proposed. We theoretically show that AE-SAM has the same convergence rate as SAM. Experimental results on various datasets and architectures demonstrate the efficiency and effectiveness of the adaptive policy.
翻译:锐度感知最小化(SAM)通过最小-最大优化搜索平坦极小值,已被证明有助于提升模型泛化能力。然而,由于每次SAM更新需计算两次梯度,其计算开销和训练时间均为标准经验风险最小化(ERM)的两倍。近期前沿研究通过随机或周期性在SAM与ERM更新间切换,减少SAM更新比例以加速训练。本文基于损失景观几何特性,设计了一种自适应策略来运用SAM。我们提出两种高效算法:AE-SAM与AE-LookSAM。理论分析表明,AE-SAM具有与SAM相同的收敛速率。在多种数据集和架构上的实验验证了该自适应策略的效率和有效性。