Recent work in continual learning has highlighted the stability gap -- a temporary performance drop on previously learned tasks when new ones are introduced. This phenomenon reflects a mismatch between rapid adaptation and strong retention at task boundaries, underscoring the need for optimization mechanisms that balance plasticity and stability over abrupt distribution changes. While optimizers such as momentum-SGD and Adam introduce implicit multi-timescale behavior, they still exhibit pronounced stability gaps. Importantly, these gaps persist even under ideal joint training, making it crucial to study them in this setting to isolate their causes from other sources of forgetting. Motivated by how noradrenergic (neuromodulatory) bursts transiently increase neuronal gain under uncertainty, we introduce a dynamic gain scaling mechanism as a two-timescale optimization technique that balances adaptation and retention by modulating effective learning rates and flattening the local landscape through an effective reparameterization. Across domain- and class-incremental MNIST, CIFAR, and mini-ImageNet benchmarks under task-agnostic joint training, dynamic gain scaling effectively attenuates stability gaps while maintaining competitive accuracy, improving robustness at task transitions.
翻译:持续学习领域的最新研究揭示了稳定性差距现象——当引入新任务时,先前已学习任务的性能会出现暂时性下降。这一现象反映了任务边界处快速适应能力与强保持能力之间的不匹配,凸显了需要一种能在分布突变时平衡可塑性与稳定性的优化机制。尽管动量随机梯度下降和Adam等优化器引入了隐式的多时间尺度行为,它们仍表现出显著的稳定性差距。重要的是,即使在理想的联合训练条件下,这些差距依然存在,因此有必要在此设定下对其进行研究,以将其成因与其他遗忘来源相隔离。受不确定性条件下去甲肾上腺素能(神经调节)爆发会瞬时增强神经元增益的机制启发,我们引入了一种动态增益缩放机制作为双时间尺度优化技术。该机制通过调节有效学习率,并借助有效的重参数化平坦化局部损失景观,实现了适应性与保持性的平衡。在任务无关的联合训练设置下,跨领域增量与类别增量的MNIST、CIFAR和mini-ImageNet基准测试表明,动态增益缩放能有效缓解稳定性差距,同时保持具有竞争力的准确率,提升了任务转换时的鲁棒性。