Few-Shot Class Incremental Learning (FSCIL) is a task that requires a model to learn new classes incrementally without forgetting when only a few samples for each class are given. FSCIL encounters two significant challenges: catastrophic forgetting and overfitting, and these challenges have driven prior studies to primarily rely on shallow models, such as ResNet-18. Even though their limited capacity can mitigate both forgetting and overfitting issues, it leads to inadequate knowledge transfer during few-shot incremental sessions. In this paper, we argue that large models such as vision and language transformers pre-trained on large datasets can be excellent few-shot incremental learners. To this end, we propose a novel FSCIL framework called PriViLege, Pre-trained Vision and Language transformers with prompting functions and knowledge distillation. Our framework effectively addresses the challenges of catastrophic forgetting and overfitting in large models through new pre-trained knowledge tuning (PKT) and two losses: entropy-based divergence loss and semantic knowledge distillation loss. Experimental results show that the proposed PriViLege significantly outperforms the existing state-of-the-art methods with a large margin, e.g., +9.38% in CUB200, +20.58% in CIFAR-100, and +13.36% in miniImageNet. Our implementation code is available at https://github.com/KHU-AGI/PriViLege.
翻译:少样本类别增量学习(FSCIL)是一项要求模型仅凭每类少量样本,在不遗忘旧知识的情况下逐步学习新类别的任务。FSCIL面临两大挑战:灾难性遗忘与过拟合,这些挑战促使先前研究主要依赖浅层模型(如ResNet-18)。尽管浅层模型的有限容量可缓解遗忘与过拟合问题,但在少样本增量阶段会导致知识迁移不足。本文提出,在大型数据集上预训练的视觉与语言Transformer等大模型,可成为优秀的少样本增量学习者。为此,我们设计了一种新型FSCIL框架PriViLege(基于预训练视觉与语言Transformer的提示函数与知识蒸馏框架)。该框架通过创新的预训练知识调优(PKT)以及两类损失函数——基于熵的散度损失与语义知识蒸馏损失——有效解决了大模型中的灾难性遗忘与过拟合挑战。实验结果表明,PriViLege在CUB200、CIFAR-100和miniImageNet上分别以+9.38%、+20.58%和+13.36%的显著优势大幅超越现有最先进方法。我们的实现代码已开源至 https://github.com/KHU-AGI/PriViLege。