Continual learning aims to enable a model to incrementally learn knowledge from sequentially arrived data. Previous works adopt the conventional classification architecture, which consists of a feature extractor and a classifier. The feature extractor is shared across sequentially arrived tasks or classes, but one specific group of weights of the classifier corresponding to one new class should be incrementally expanded. Consequently, the parameters of a continual learner gradually increase. Moreover, as the classifier contains all historical arrived classes, a certain size of the memory is usually required to store rehearsal data to mitigate classifier bias and catastrophic forgetting. In this paper, we propose a non-incremental learner, named AttriCLIP, to incrementally extract knowledge of new classes or tasks. Specifically, AttriCLIP is built upon the pre-trained visual-language model CLIP. Its image encoder and text encoder are fixed to extract features from both images and text. Text consists of a category name and a fixed number of learnable parameters which are selected from our designed attribute word bank and serve as attributes. As we compute the visual and textual similarity for classification, AttriCLIP is a non-incremental learner. The attribute prompts, which encode the common knowledge useful for classification, can effectively mitigate the catastrophic forgetting and avoid constructing a replay memory. We evaluate our AttriCLIP and compare it with CLIP-based and previous state-of-the-art continual learning methods in realistic settings with domain-shift and long-sequence learning. The results show that our method performs favorably against previous state-of-the-arts. The implementation code can be available at https://github.com/bhrqw/AttriCLIP.
翻译:持续学习旨在使模型能够从顺序到达的数据中增量地学习知识。以往的研究采用传统的分类架构,该架构由特征提取器和分类器组成。特征提取器在顺序到达的任务或类别间共享,但分类器中对应新类别的特定权重组需要增量扩展。因此,持续学习器的参数会逐渐增加。此外,由于分类器包含所有历史出现的类别,通常需要一定大小的记忆来存储重放数据,以缓解分类器偏差和灾难性遗忘。本文提出一种名为AttriCLIP的非增量学习器,用于增量提取新类别或任务的知识。具体而言,AttriCLIP基于预训练的视觉-语言模型CLIP构建。其图像编码器和文本编码器固定不变,用于提取图像和文本特征。文本由类别名称和从我们设计的属性词库中选取的固定数量可学习参数组成,这些参数作为属性。通过计算视觉和文本相似度进行分类,AttriCLIP是一种非增量学习器。编码了对分类有用的通用知识的属性提示,能够有效缓解灾难性遗忘,并避免构建重放记忆。我们在具有域偏移和长序列学习的现实场景中评估了AttriCLIP,并将其与基于CLIP的方法及先前最先进的持续学习方法进行比较。结果表明,我们的方法优于先前的最先进方法。实现代码可在https://github.com/bhrqw/AttriCLIP获取。