Few-Shot Class Incremental Learning (FSCIL) is a challenging continual learning task, where limited training examples are available during several learning sessions. To succeed in this task, it is necessary to avoid over-fitting new classes caused by biased distributions in the few-shot training sets. The general approach to address this issue involves enhancing the representational capability of a pre-defined backbone architecture by adding special modules for backward compatibility with older classes. However, this approach has not yet solved the dilemma of ensuring high classification accuracy over time while reducing the gap between the performance obtained on larger training sets and the smaller ones. In this work, we propose an alternative approach called Continual Parameter-Efficient CLIP (CPE-CLIP) to reduce the loss of information between different learning sessions. Instead of adapting additional modules to address information loss, we leverage the vast knowledge acquired by CLIP in large-scale pre-training and its effectiveness in generalizing to new concepts. Our approach is multimodal and parameter-efficient, relying on learnable prompts for both the language and vision encoders to enable transfer learning across sessions. We also introduce prompt regularization to improve performance and prevent forgetting. Our experimental results demonstrate that CPE-CLIP significantly improves FSCIL performance compared to state-of-the-art proposals while also drastically reducing the number of learnable parameters and training costs.
翻译:小样本类增量学习(FSCIL)是一项具有挑战性的持续学习任务,其中多个学习阶段仅有有限的训练样本可用。要在此任务中取得成功,必须避免因小样本训练集的偏置分布而导致的新类过拟合问题。解决该问题的通用方法是通过添加针对旧类后向兼容性的特殊模块,增强预定义骨干架构的表征能力。然而,该方法尚未解决在缩小大规模与小规模训练集性能差距的同时,长期保持高分类准确率的困境。本文提出一种名为持续参数高效CLIP(CPE-CLIP)的替代方法,以减少不同学习阶段间的信息损失。我们并未通过适配额外模块来应对信息损失,而是充分利用CLIP在大规模预训练中获得的丰富知识及其对新概念的泛化能力。该方法具有多模态与参数高效特性,通过在语言编码器和视觉编码器中引入可学习提示,实现跨学习阶段的迁移学习。同时,我们引入提示正则化技术以提升性能并防止遗忘。实验结果表明,与现有最优方法相比,CPE-CLIP显著提升了FSCIL性能,同时大幅减少了可学习参数量与训练成本。