While many FSCIL studies have been undertaken, achieving satisfactory performance, especially during incremental sessions, has remained challenging. One prominent challenge is that the encoder, trained with an ample base session training set, often underperforms in incremental sessions. In this study, we introduce a novel training framework for FSCIL, capitalizing on the generalizability of the Contrastive Language-Image Pre-training (CLIP) model to unseen classes. We achieve this by formulating image-object-specific (IOS) classifiers for the input images. Here, an IOS classifier refers to one that targets specific attributes (like wings or wheels) of class objects rather than the image's background. To create these IOS classifiers, we encode a bias prompt into the classifiers using our specially designed module, which harnesses key-prompt pairs to pinpoint the IOS features of classes in each session. From an FSCIL standpoint, our framework is structured to retain previous knowledge and swiftly adapt to new sessions without forgetting or overfitting. This considers the updatability of modules in each session and some tricks empirically found for fast convergence. Our approach consistently demonstrates superior performance compared to state-of-the-art methods across the miniImageNet, CIFAR100, and CUB200 datasets. Further, we provide additional experiments to validate our learned model's ability to achieve IOS classifiers. We also conduct ablation studies to analyze the impact of each module within the architecture.
翻译:尽管已经进行了许多少样本类增量学习(FSCIL)研究,但在增量会话中实现令人满意的性能仍具有挑战性。一个显著的挑战是,使用充足的基础会话训练集训练的编码器在增量会话中往往表现不佳。在本研究中,我们引入了一种新颖的FSCIL训练框架,利用对比语言-图像预训练(CLIP)模型对未见类别的泛化能力,通过为输入图像构建图像对象特定(IOS)分类器来实现。这里,IOS分类器指的是针对类对象特定属性(如翅膀或轮子)而非图像背景进行分类的分类器。为了创建这些IOS分类器,我们使用专门设计的模块将偏置提示编码到分类器中,该模块利用关键提示对来定位每个会话中各类别的IOS特征。从FSCIL的角度来看,我们的框架旨在保留先前知识并快速适应新会话,而不会发生遗忘或过拟合。这考虑了每个会话中模块的可更新性以及一些经验发现的可用于快速收敛的技巧。我们的方法在miniImageNet、CIFAR100和CUB200数据集上始终表现出优于最先进方法的性能。此外,我们提供了额外的实验来验证所学模型实现IOS分类器的能力,并通过消融实验分析了架构中每个模块的影响。