Continual Learning (CL) aims to enable models to sequentially learn multiple tasks without forgetting previous knowledge. Recent studies have shown that optimizing towards flatter loss minima can improve model generalization. However, existing sharpness-aware methods for CL suffer from two key limitations: (1) they treat sharpness regularization as a unified signal without distinguishing the contributions of its components. and (2) they introduce substantial computational overhead that impedes practical deployment. To address these challenges, we propose FLAD, a novel optimization framework that decomposes sharpness-aware perturbations into gradient-aligned and stochastic-noise components, and show that retaining only the noise component promotes generalization. We further introduce a lightweight scheduling scheme that enables FLAD to maintain significant performance gains even under constrained training time. FLAD can be seamlessly integrated into various CL paradigms and consistently outperforms standard and sharpness-aware optimizers in diverse experimental settings, demonstrating its effectiveness and practicality in CL.
翻译:持续学习(CL)旨在使模型能够顺序学习多个任务而不遗忘先前知识。近期研究表明,优化至更平坦的损失极小值可以提升模型泛化能力。然而,现有面向持续学习的锐度感知方法存在两个关键局限:(1)它们将锐度正则化视为统一信号,未区分其各组成部分的贡献;(2)它们引入了大量计算开销,阻碍了实际部署。为应对这些挑战,我们提出FLAD,一种新颖的优化框架,将锐度感知扰动分解为梯度对齐分量和随机噪声分量,并证明仅保留噪声分量即可促进泛化。我们进一步引入一种轻量级调度方案,使FLAD即使在受限训练时间下也能保持显著的性能提升。FLAD可无缝集成到多种持续学习范式中,并在多样化的实验设置中持续优于标准及锐度感知优化器,证明了其在持续学习中的有效性和实用性。