Class-incremental learning is a challenging problem, where the goal is to train a model that can classify data from an increasing number of classes over time. With the advancement of vision-language pre-trained models such as CLIP, they demonstrate good generalization ability that allows them to excel in class-incremental learning with completely frozen parameters. However, further adaptation to downstream tasks by simply fine-tuning the model leads to severe forgetting. Most existing works with pre-trained models assume that the forgetting of old classes is uniform when the model acquires new knowledge. In this paper, we propose a method named Adaptive Representation Adjustment and Parameter Fusion (RAPF). During training for new data, we measure the influence of new classes on old ones and adjust the representations, using textual features. After training, we employ a decomposed parameter fusion to further mitigate forgetting during adapter module fine-tuning. Experiments on several conventional benchmarks show that our method achieves state-of-the-art results. Our code is available at \url{https://github.com/linlany/RAPF}.
翻译:类增量学习是一个具有挑战性的问题,其目标是训练一个能够随时间对来自不断增加类别的数据进行分类的模型。随着诸如CLIP等视觉语言预训练模型的发展,它们展现出良好的泛化能力,使其能够在参数完全冻结的情况下在类增量学习中表现出色。然而,通过简单地微调模型来进一步适应下游任务会导致严重的遗忘。大多数使用预训练模型的现有工作假设,当模型获取新知识时,对旧类别的遗忘是均匀的。在本文中,我们提出了一种名为自适应表征调整与参数融合的方法。在对新数据进行训练期间,我们衡量新类别对旧类别的影响,并使用文本特征来调整表征。训练结束后,我们采用一种分解的参数融合策略,以进一步减轻在适配器模块微调过程中的遗忘。在多个常规基准测试上的实验表明,我们的方法取得了最先进的结果。我们的代码可在 \url{https://github.com/linlany/RAPF} 获取。