The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. Observing that selectively reducing the learning rate can almost resolve this issue in the representation layer, we propose a simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves the classification layer by modeling the class-wise distributions and aligning the classification layers in a post-hoc fashion. Across a variety of scenarios, our proposal provides substantial improvements for CLPM (e.g., up to 49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split CUB-200 and Split Cars-196, respectively), and thus outperforms state-of-the-art approaches by a large margin. Based on such a strong baseline, critical factors and promising directions are analyzed in-depth to facilitate subsequent research. Code has been made available at: https://github.com/GengDavid/SLCA.
翻译:持续学习的目标是提升识别模型在学习序列到达数据时的性能。尽管现有工作大多基于从头学习的假设,但已有越来越多研究致力于融入预训练的优势。然而,如何在保持预训练知识泛化能力的同时,针对每个增量任务自适应地利用这些知识仍是一个未解难题。本文对基于预训练模型的持续学习(CLPM)进行了全面分析,将关键挑战归因于渐进式过拟合问题。通过观察发现,在表征层中有选择地降低学习率几乎能解决该问题,我们提出了一种简单但极其有效的方法——慢速学习器与分类器对齐(SLCA)。该方法通过建模类别分布并以事后(post-hoc)方式对齐分类层,进一步改进了分类层。在多种场景下,我们的方案为CLPM带来了显著提升(例如在Split CIFAR-100、Split ImageNet-R、Split CUB-200和Split Cars-196上分别提升49.76%、50.05%、44.69%和40.16%),并以较大优势超越了现有最优方法。基于这一强基线,我们深入分析了关键因素和潜在方向,以促进后续研究。代码已开源至:https://github.com/GengDavid/SLCA