Catastrophic forgetting has been a major challenge in continual learning, where the model needs to learn new tasks with limited or no access to data from previously seen tasks. To tackle this challenge, methods based on knowledge distillation in feature space have been proposed and shown to reduce forgetting. However, most feature distillation methods directly constrain the new features to match the old ones, overlooking the need for plasticity. To achieve a better stability-plasticity trade-off, we propose Backward Feature Projection (BFP), a method for continual learning that allows the new features to change up to a learnable linear transformation of the old features. BFP preserves the linear separability of the old classes while allowing the emergence of new feature directions to accommodate new classes. BFP can be integrated with existing experience replay methods and boost performance by a significant margin. We also demonstrate that BFP helps learn a better representation space, in which linear separability is well preserved during continual learning and linear probing achieves high classification accuracy. The code can be found at https://github.com/rvl-lab-utoronto/BFP
翻译:灾难性遗忘一直是持续学习中的主要挑战,即模型需要在有限或无法访问先前任务数据的情况下学习新任务。为应对这一挑战,研究者提出了基于特征空间知识蒸馏的方法,并证明其能有效减少遗忘。然而,大多数特征蒸馏方法直接约束新特征与旧特征匹配,忽视了可塑性需求。为实现更好的稳定性-可塑性权衡,我们提出反向特征投影(Backward Feature Projection, BFP)——一种持续学习方法,允许新特征通过旧特征的可学习线性变换进行调整。BFP在保留旧类别线性可分性的同时,允许新特征方向的出现以适应新类别。BFP可与现有经验重放方法集成,并显著提升性能。我们还证明BFP有助于学习更优的表示空间,在该空间中持续学习过程中线性可分性得到良好保持,线性探测能实现高分类准确率。代码见 https://github.com/rvl-lab-utoronto/BFP