Sharpness-aware minimization (SAM) methods have gained increasing popularity by formulating the problem of minimizing both loss value and loss sharpness as a minimax objective. In this work, we increase the efficiency of the maximization and minimization parts of SAM's objective to achieve a better loss-sharpness trade-off. By taking inspiration from the Lookahead optimizer, which uses multiple descent steps ahead, we propose Lookbehind, which performs multiple ascent steps behind to enhance the maximization step of SAM and find a worst-case perturbation with higher loss. Then, to mitigate the variance in the descent step arising from the gathered gradients across the multiple ascent steps, we employ linear interpolation to refine the minimization step. Lookbehind leads to a myriad of benefits across a variety of tasks. Particularly, we show increased generalization performance, greater robustness against noisy weights, as well as improved learning and less catastrophic forgetting in lifelong learning settings.
翻译:锐度感知最小化(SAM)方法通过将最小化损失值与损失锐度的问题表述为极小极大目标而日益受到关注。本工作中,我们提升了SAM目标中最大化与最小化部分的效率,以实现更优的损失-锐度权衡。受Lookahead优化器(采用多步前向下降)的启发,我们提出Lookbehind方法,该方法执行多步后向上升以增强SAM的最大化步骤,寻找具有更高损失的最坏扰动。为缓解因跨多步上升步骤收集梯度而产生的下降步骤方差,我们采用线性插值来优化最小化步骤。Lookbehind在多种任务中带来诸多优势,尤其在泛化性能提升、对噪声权重鲁棒性增强、学习效率改进以及终身学习场景中灾难性遗忘减少等方面表现显著。