Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners

Few-Shot Class Incremental Learning (FSCIL) is a task that requires a model to learn new classes incrementally without forgetting when only a few samples for each class are given. FSCIL encounters two significant challenges: catastrophic forgetting and overfitting, and these challenges have driven prior studies to primarily rely on shallow models, such as ResNet-18. Even though their limited capacity can mitigate both forgetting and overfitting issues, it leads to inadequate knowledge transfer during few-shot incremental sessions. In this paper, we argue that large models such as vision and language transformers pre-trained on large datasets can be excellent few-shot incremental learners. To this end, we propose a novel FSCIL framework called PriViLege, Pre-trained Vision and Language transformers with prompting functions and knowledge distillation. Our framework effectively addresses the challenges of catastrophic forgetting and overfitting in large models through new pre-trained knowledge tuning (PKT) and two losses: entropy-based divergence loss and semantic knowledge distillation loss. Experimental results show that the proposed PriViLege significantly outperforms the existing state-of-the-art methods with a large margin, e.g., +9.38% in CUB200, +20.58% in CIFAR-100, and +13.36% in miniImageNet. Our implementation code is available at https://github.com/KHU-AGI/PriViLege.

翻译：少样本类别增量学习（FSCIL）是一项要求模型仅凭每类少量样本，在不遗忘旧知识的情况下逐步学习新类别的任务。FSCIL面临两大挑战：灾难性遗忘与过拟合，这些挑战促使先前研究主要依赖浅层模型（如ResNet-18）。尽管浅层模型的有限容量可缓解遗忘与过拟合问题，但在少样本增量阶段会导致知识迁移不足。本文提出，在大型数据集上预训练的视觉与语言Transformer等大模型，可成为优秀的少样本增量学习者。为此，我们设计了一种新型FSCIL框架PriViLege（基于预训练视觉与语言Transformer的提示函数与知识蒸馏框架）。该框架通过创新的预训练知识调优（PKT）以及两类损失函数——基于熵的散度损失与语义知识蒸馏损失——有效解决了大模型中的灾难性遗忘与过拟合挑战。实验结果表明，PriViLege在CUB200、CIFAR-100和miniImageNet上分别以+9.38%、+20.58%和+13.36%的显著优势大幅超越现有最先进方法。我们的实现代码已开源至 https://github.com/KHU-AGI/PriViLege。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日