Continual learning (CL) with Vision-Language Models (VLMs) has overcome the constraints of traditional CL, which only focuses on previously encountered classes. During the CL of VLMs, we need not only to prevent the catastrophic forgetting on incrementally learned knowledge but also to preserve the zero-shot ability of VLMs. However, existing methods require additional reference datasets to maintain such zero-shot ability and rely on domain-identity hints to classify images across different domains. In this study, we propose Regression-based Analytic Incremental Learning (RAIL), which utilizes a recursive ridge regression-based adapter to learn from a sequence of domains in a non-forgetting manner and decouple the cross-domain correlations by projecting features to a higher-dimensional space. Cooperating with a training-free fusion module, RAIL absolutely preserves the VLM's zero-shot ability on unseen domains without any reference data. Additionally, we introduce Cross-domain Task-Agnostic Incremental Learning (X-TAIL) setting. In this setting, a CL learner is required to incrementally learn from multiple domains and classify test images from both seen and unseen domains without any domain-identity hint. We theoretically prove RAIL's absolute memorization on incrementally learned domains. Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings. The code will be released upon acceptance.
翻译:视觉语言模型(VLM)的持续学习(CL)克服了传统持续学习仅关注已见类别的限制。在VLM的持续学习过程中,我们不仅需要防止对增量学习知识的灾难性遗忘,还需保持VLM的零样本能力。然而,现有方法需要额外的参考数据集来维持这种零样本能力,并依赖域身份提示来对不同域的图像进行分类。在本研究中,我们提出基于回归的解析式增量学习(RAIL),该方法利用基于递归岭回归的适配器,以无遗忘的方式从一系列域中学习,并通过将特征投影到更高维空间来解耦跨域相关性。结合一个无需训练的特征融合模块,RAIL在无需任何参考数据的情况下,完全保留了VLM在未见域上的零样本能力。此外,我们引入了跨域任务无关增量学习(X-TAIL)设定。在此设定下,持续学习器需要从多个域进行增量学习,并在没有任何域身份提示的情况下,对来自已见域和未见域的测试图像进行分类。我们从理论上证明了RAIL对增量学习域的绝对记忆能力。实验结果证实了RAIL在X-TAIL和现有的多域任务增量学习设定中均达到了最先进的性能。代码将在论文被接受后发布。