Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The challenge emerges in seamlessly integrating new classes with few samples into the training data, demanding the model to adeptly accommodate these additions without compromising its performance on base classes. To address this exigency, the research community has introduced several solutions under the realm of few-shot class incremental learning (FSCIL). In this study, we introduce an innovative FSCIL framework that utilizes language regularizer and subspace regularizer. During base training, the language regularizer helps incorporate semantic information extracted from a Vision-Language model. The subspace regularizer helps in facilitating the model's acquisition of nuanced connections between image and text semantics inherent to base classes during incremental training. Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes. To substantiate the efficacy of our approach, we conduct comprehensive experiments on three distinct FSCIL benchmarks, where our framework attains state-of-the-art performance.
翻译:深度学习的最新进展已在各类监督计算机视觉任务中展现出与人类能力相媲美的卓越性能。然而,模型训练前需包含所有类别的海量训练数据这一普遍假设往往偏离实际场景——新类别的数据可用性通常极为有限。如何将少量样本的新类别无缝融入训练数据,同时要求模型在不牺牲基类性能的前提下灵活适应新增类别,成为关键挑战。为应对这一迫切需求,研究界在小样本增量学习(FSCIL)领域提出了多种解决方案。本研究提出了一种创新的FSCIL框架,利用语言正则化器和子空间正则化器。基础训练阶段,语言正则化器有助于整合从视觉语言模型中提取的语义信息;增量训练阶段,子空间正则化器则促进模型学习基类图像与文本语义间固有的细微关联。所提框架不仅使模型能够以有限数据接纳新类别,还能确保基类性能的保持。为验证本方法的有效性,我们在三个不同的FSCIL基准上开展了全面实验,实验结果表明该框架达到了当前最优性能。