Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading to inferior performance. In addition, the lack of fine-grained layer specific prompts does not allow these to fully express the strength of the prompts for CL. We address these limitations by proposing ConvPrompt, a novel convolutional prompt creation mechanism that maintains layer-wise shared embeddings, enabling both layer-specific learning and better concept transfer across tasks. The intelligent use of convolution enables us to maintain a low parameter overhead without compromising performance. We further leverage Large Language Models to generate fine-grained text descriptions of each category which are used to get task similarity and dynamically decide the number of prompts to be learned. Extensive experiments demonstrate the superiority of ConvPrompt and improves SOTA by ~3% with significantly less parameter overhead. We also perform strong ablation over various modules to disentangle the importance of different components.
翻译:持续学习(Continual Learning, CL)使机器学习模型能够在不依赖旧任务数据的情况下,从持续变化的新训练数据中学习。近期,预训练视觉变换器与提示调优的结合在克服持续学习中的灾难性遗忘方面展现出潜力。这些方法依赖于可学习的提示池,然而该池在跨任务知识共享方面效率较低,导致性能欠佳。此外,缺乏细粒度、针对特定层的提示,使得这些方法无法充分发挥提示在持续学习中的优势。为解决这些局限,我们提出了ConvPrompt——一种新颖的卷积提示创建机制。该机制维护逐层共享的嵌入表示,既能实现层特定的学习,又能促进跨任务的概念迁移。卷积的巧妙运用使我们能够在保持低参数开销的同时不牺牲性能。我们进一步利用大型语言模型为每个类别生成细粒度文本描述,并基于这些描述获取任务相似度,从而动态决定需要学习的提示数量。大量实验证明了ConvPrompt的优越性:在参数开销显著降低的情况下,其性能较当前最优方法提升了约3%。我们还对各个模块进行了充分的消融研究,以厘清不同组件的重要性。