We address the Continual Learning (CL) problem, wherein a model must learn a sequence of tasks from non-stationary distributions while preserving prior knowledge upon encountering new experiences. With the advancement of foundation models, CL research has pivoted from the initial learning-from-scratch paradigm towards utilizing generic features from large-scale pre-training. However, existing approaches to CL with pre-trained models primarily focus on separating class-specific features from the final representation layer and neglect the potential of intermediate representations to capture low- and mid-level features, which are more invariant to domain shifts. In this work, we propose LayUP, a new prototype-based approach to CL that leverages second-order feature statistics from multiple intermediate layers of a pre-trained network. Our method is conceptually simple, does not require access to prior data, and works out of the box with any foundation model. LayUP surpasses the state of the art in four of the seven class-incremental learning benchmarks, all three domain-incremental learning benchmarks and in six of the seven online continual learning benchmarks, while significantly reducing memory and computational requirements compared to existing baselines. Our results demonstrate that fully exhausting the representational capacities of pre-trained models in CL goes well beyond their final embeddings.
翻译:本文研究持续学习(Continual Learning, CL)问题,即模型需要从非平稳分布的任务序列中持续学习,同时在新任务到来时保持对先前知识的记忆。随着基础模型的发展,CL研究范式已从最初的从头训练模式转向利用大规模预训练获得的通用特征。然而,现有基于预训练模型的CL方法主要聚焦于从最终表征层分离类别特异性特征,忽视了中间层捕获中低级特征的潜力——这些特征对领域偏移具有更强的鲁棒性。本工作提出LayUP,一种基于原型的新型CL方法,该方法利用预训练网络多个中间层的二阶特征统计量。我们的方法概念简洁,无需访问历史数据,且可直接适用于任何基础模型。在七种类增量学习基准测试中,LayUP在四项取得最优性能;在全部三项领域增量学习基准测试中均表现最佳;在七项在线持续学习基准测试中,有六项超越现有方法。与此同时,该方法相比现有基线显著降低了内存与计算需求。实验结果表明,在CL任务中充分挖掘预训练模型的表征能力,其价值远不止于利用最终嵌入层。