Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 10\% and 62\% on seven class-incremental benchmark datasets, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast continual learning has not hitherto been fully tapped.
翻译:持续学习旨在从非平稳数据流中逐步学习不同任务(如分类),同时避免遗忘旧知识。大多数持续学习研究关注从头学习范式下的灾难性遗忘问题。然而,随着基础模型的日益突出,具备丰富表征能力的预训练模型已可用于各种下游需求。现有基于预训练模型的持续学习方法要么直接利用预提取特征(导致分布鸿沟难以弥合),要么引入适配器(可能引发遗忘问题)。本文提出一种简洁高效的基于预训练模型的持续学习方案。考虑到参数更新会导致遗忘,我们另辟蹊径,采用免训练的随机投影器与类别原型累积机制,从而规避该问题。具体而言,我们在预训练模型的特征表征与输出头之间注入带有非线性激活的冻结随机投影层,该层通过扩展维度捕获特征间的交互作用,为基于类别原型的持续学习提供增强的线性可分性。我们还证明了去相关类别原型对于减小预训练表征分布差异的重要性。这些技术经证实在类增量与域增量持续学习中均能有效规避遗忘问题。相较于应用于预训练ViT-B/16模型的既有方法,我们在七个类增量基准数据集上无需任何回放记忆,最终错误率降低10%至62%。我们得出结论:预训练模型在实现简单、高效、快速的持续学习方面的全部潜力迄今尚未被充分挖掘。