Humans are capable of acquiring new knowledge and transferring learned knowledge into different domains, incurring a small forgetting. The same ability, called Continual Learning, is challenging to achieve when operating with neural networks due to the forgetting affecting past learned tasks when learning new ones. This forgetting can be mitigated by replaying stored samples from past tasks, but a large memory size may be needed for long sequences of tasks; moreover, this could lead to overfitting on saved samples. In this paper, we propose a novel regularisation approach and a novel incremental classifier called, respectively, Margin Dampening and Cascaded Scaling Classifier. The first combines a soft constraint and a knowledge distillation approach to preserve past learned knowledge while allowing the model to learn new patterns effectively. The latter is a gated incremental classifier, helping the model modify past predictions without directly interfering with them. This is achieved by modifying the output of the model with auxiliary scaling functions. We empirically show that our approach performs well on multiple benchmarks against well-established baselines, and we also study each component of our proposal and how the combinations of such components affect the final results.
翻译:人类能够获取新知识并将已学知识迁移到不同领域,仅产生少量遗忘。这种被称为持续学习的能力,在神经网络中实现时面临巨大挑战——学习新任务会导致对已学任务的遗忘。通过回放存储的旧任务样本可缓解遗忘,但长序列任务可能需要极大存储空间,且可能导致对存储样本的过拟合。本文提出两种创新方法:边缘抑制正则化方法与级联缩放分类器。前者通过软约束与知识蒸馏相结合,在保留旧知识的同时高效学习新模式;后者作为门控增量分类器,通过辅助缩放函数修改模型输出,在不直接干扰旧预测的前提下调整历史预测结果。实验表明,该方法在多个基准测试中优于现有基线模型,我们同时分析了各组件及其组合方式对最终性能的影响。