Continual learning (CL) can help pre-trained vision-language models efficiently adapt to new or under-trained data distributions without re-training. Nevertheless, during the continual training of the Contrastive Language-Image Pre-training (CLIP) model, we observe that the model's zero-shot transfer ability significantly degrades due to catastrophic forgetting. Existing CL methods can mitigate forgetting by replaying previous data. However, since the CLIP dataset is private, replay methods cannot access the pre-training dataset. In addition, replaying data of previously learned downstream tasks can enhance their performance but comes at the cost of sacrificing zero-shot performance. To address this challenge, we propose a novel method ZSCL to prevent zero-shot transfer degradation in the continual learning of vision-language models in both feature and parameter space. In the feature space, a reference dataset is introduced for distillation between the current and initial models. The reference dataset should have semantic diversity but no need to be labeled, seen in pre-training, or matched image-text pairs. In parameter space, we prevent a large parameter shift by averaging weights during the training. We propose a more challenging Multi-domain Task Incremental Learning (MTIL) benchmark to evaluate different methods, where tasks are from various domains instead of class-separated in a single dataset. Our method outperforms other methods in the traditional class-incremental learning setting and the MTIL by 9.7% average score. Our code locates at https://github.com/Thunderbeee/ZSCL.
翻译:持续学习(CL)能够帮助预训练的视觉-语言模型高效适应新的或欠训练的数据分布,而无需重新训练。然而,在对对比语言-图像预训练(CLIP)模型进行持续训练时,我们观察到模型的零样本迁移能力会因灾难性遗忘而显著退化。现有的持续学习方法可通过重放先前数据来缓解遗忘。但由于CLIP数据集是私有的,重放方法无法访问预训练数据集。此外,重放先前学习过的下游任务数据虽能提升其性能,但会以牺牲零样本性能为代价。为解决这一挑战,我们提出了一种名为ZSCL的新方法,以在特征空间和参数空间中同时防止视觉-语言模型持续学习中的零样本迁移退化。在特征空间中,我们引入了一个参考数据集用于在当前模型与初始模型之间进行蒸馏。该参考数据集应具有语义多样性,但无需标注、无需出现在预训练中,也无需是匹配的图像-文本对。在参数空间中,我们通过在训练过程中对权重进行平均来防止较大的参数偏移。我们提出了一个更具挑战性的多域任务增量学习(MTIL)基准来评估不同方法,其中任务来自多个不同域,而非单一数据集中的类别划分。在传统的类增量学习设置和MTIL中,我们的方法以平均9.7%的得分超越了其他方法。我们的代码位于 https://github.com/Thunderbeee/ZSCL。