Model generalization ability upon incrementally acquiring dynamically updating knowledge from sequentially arriving tasks is crucial to tackle the sensitivity-stability dilemma in Continual Learning (CL). Weight loss landscape sharpness minimization seeking for flat minima lying in neighborhoods with uniform low loss or smooth gradient is proven to be a strong training regime improving model generalization compared with loss minimization based optimizer like SGD. Yet only a few works have discussed this training regime for CL, proving that dedicated designed zeroth-order sharpness optimizer can improve CL performance. In this work, we propose a Continual Flatness (C-Flat) method featuring a flatter loss landscape tailored for CL. C-Flat could be easily called with only one line of code and is plug-and-play to any CL methods. A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases. Code is available at https://github.com/WanNaa/C-Flat.
翻译:模型在从顺序到达的任务中逐步获取动态更新知识时的泛化能力,对于解决持续学习(CL)中的敏感性与稳定性困境至关重要。与基于损失最小化的优化器(如SGD)相比,寻求位于具有均匀低损失或平滑梯度的邻域内的平坦极小值的权重损失景观锐度最小化,已被证明是一种能提升模型泛化能力的强大训练机制。然而,仅有少数工作讨论了该训练机制在CL中的应用,证明了专门设计的零阶锐度优化器可以提升CL性能。在本工作中,我们提出了一种专为CL设计的、具有更平坦损失景观的持续平坦性(C-Flat)方法。C-Flat仅需一行代码即可轻松调用,并且可即插即用于任何CL方法。本文提出了一个适用于所有CL类别的C-Flat通用框架,并与基于损失极小值的优化器及基于平坦极小值的CL方法进行了全面比较,结果表明我们的方法在几乎所有情况下都能提升CL性能。代码可在 https://github.com/WanNaa/C-Flat 获取。