The Lookahead optimizer improves the training stability of deep neural networks by having a set of fast weights that "look ahead" to guide the descent direction. Here, we combine this idea with sharpness-aware minimization (SAM) to stabilize its multi-step variant and improve the loss-sharpness trade-off. We propose Lookbehind, which computes $k$ gradient ascent steps ("looking behind") at each iteration and combine the gradients to bias the descent step toward flatter minima. We apply Lookbehind on top of two popular sharpness-aware training methods -- SAM and adaptive SAM (ASAM) -- and show that our approach leads to a myriad of benefits across a variety of tasks and training regimes. Particularly, we show increased generalization performance, greater robustness against noisy weights, and higher tolerance to catastrophic forgetting in lifelong learning settings.
翻译:Lookahead优化器通过一组“向前看”的快权重引导下降方向,改善了深度神经网络的训练稳定性。本文将该思想与锐度感知最小化(SAM)结合,以稳定其多步变体并改进损失-锐度权衡。我们提出Lookbehind方法,该算法在每次迭代中计算$k$步梯度上升(“向后看”),并将这些梯度合并以将下降步骤偏向更平坦的极小值。我们将Lookbehind应用于两种主流的锐度感知训练方法——SAM和自适应SAM(ASAM)——并表明该方法在多种任务和训练机制中带来了诸多益处。具体而言,我们观察到泛化性能提升、对噪声权重具有更强的鲁棒性,以及在终身学习场景中更高的灾难性遗忘容忍度。