Humans are capable of acquiring new knowledge and transferring learned knowledge into different domains, incurring a small forgetting. The same ability, called Continual Learning, is challenging to achieve when operating with neural networks due to the forgetting affecting past learned tasks when learning new ones. This forgetting can be mitigated by replaying stored samples from past tasks, but a large memory size may be needed for long sequences of tasks; moreover, this could lead to overfitting on saved samples. In this paper, we propose a novel regularisation approach and a novel incremental classifier called, respectively, Margin Dampening and Cascaded Scaling Classifier. The first combines a soft constraint and a knowledge distillation approach to preserve past learned knowledge while allowing the model to learn new patterns effectively. The latter is a gated incremental classifier, helping the model modify past predictions without directly interfering with them. This is achieved by modifying the output of the model with auxiliary scaling functions. We empirically show that our approach performs well on multiple benchmarks against well-established baselines, and we also study each component of our proposal and how the combinations of such components affect the final results.
翻译:人类能够获取新知识并将所学知识迁移到不同领域,同时产生少量遗忘。这种被称为持续学习的能力,在神经网络中实现颇具挑战性,因为学习新任务时会对已学任务产生遗忘。通过重放存储的旧任务样本可缓解遗忘,但处理长任务序列时需要大量存储空间,且可能导致对保存样本的过拟合。本文提出了一种新颖的正则化方法Margin Dampening与增量分类器Cascaded Scaling Classifier。前者结合软约束与知识蒸馏方法,在保持模型有效学习新模式能力的同时保留已学知识;后者是一种门控增量分类器,通过辅助缩放函数修改模型输出来调整旧预测,而无需直接干扰原有预测。实验表明,该方法在多个基准测试中优于已有基线方法,同时我们深入研究了各组件及其组合方式对最终结果的影响。