RanPAC: Random Projections and Pre-trained Models for Continual Learning

Continual learning (CL) aims to incrementally learn different tasks (such as classification) in a non-stationary data stream without forgetting old ones. Most CL works focus on tackling catastrophic forgetting under a learning-from-scratch paradigm. However, with the increasing prominence of foundation models, pre-trained models equipped with informative representations have become available for various downstream requirements. Several CL methods based on pre-trained models have been explored, either utilizing pre-extracted features directly (which makes bridging distribution gaps challenging) or incorporating adaptors (which may be subject to forgetting). In this paper, we propose a concise and effective approach for CL with pre-trained models. Given that forgetting occurs during parameter updating, we contemplate an alternative approach that exploits training-free random projectors and class-prototype accumulation, which thus bypasses the issue. Specifically, we inject a frozen Random Projection layer with nonlinear activation between the pre-trained model's feature representations and output head, which captures interactions between features with expanded dimensionality, providing enhanced linear separability for class-prototype-based CL. We also demonstrate the importance of decorrelating the class-prototypes to reduce the distribution disparity when using pre-trained representations. These techniques prove to be effective and circumvent the problem of forgetting for both class- and domain-incremental continual learning. Compared to previous methods applied to pre-trained ViT-B/16 models, we reduce final error rates by between 10% and 62% on seven class-incremental benchmarks, despite not using any rehearsal memory. We conclude that the full potential of pre-trained models for simple, effective, and fast CL has not hitherto been fully tapped. Code is at github.com/RanPAC/RanPAC.

翻译：持续学习（CL）旨在非平稳数据流中增量学习不同任务（如分类），同时避免遗忘旧知识。多数持续学习研究聚焦于从头学习范式下的灾难性遗忘问题。然而，随着基础模型的日益突出，具备丰富表征能力的预训练模型已可服务于各类下游需求。现有基于预训练模型的持续学习方法，要么直接利用预提取特征（使得弥合分布差异充满挑战），要么引入适配器（可能产生遗忘问题）。本文提出一种简洁高效的预训练模型持续学习方法。鉴于遗忘发生于参数更新过程，我们另辟蹊径，采用无需训练的随机投影器与类原型累加策略，从根本上规避该问题。具体而言，我们在预训练模型的特征表征与输出头之间注入含非线性激活的冻结随机投影层，通过扩展维度捕获特征间的相互作用，为基于类原型的持续学习提供增强的线性可分性。同时我们证明，在使用预训练表征时，对类原型进行去相关处理以降低分布差异至关重要。这些方法被证实有效，且能规避类增量与域增量持续学习中的遗忘问题。与应用于预训练ViT-B/16模型的先前方法相比，我们在七个类增量基准测试中将最终错误率降低10%至62%，且无需使用任何排练记忆。我们得出结论：预训练模型在实现简单、高效、快速的持续学习方面的全部潜力此前尚未被充分挖掘。代码已开源至github.com/RanPAC/RanPAC。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日