Preventing Zero-Shot Transfer Degradation in Continual Learning of Vision-Language Models

Continual learning (CL) can help pre-trained vision-language models efficiently adapt to new or under-trained data distributions without re-training. Nevertheless, during the continual training of the Contrastive Language-Image Pre-training (CLIP) model, we observe that the model's zero-shot transfer ability significantly degrades due to catastrophic forgetting. Existing CL methods can mitigate forgetting by replaying previous data. However, since the CLIP dataset is private, replay methods cannot access the pre-training dataset. In addition, replaying data of previously learned downstream tasks can enhance their performance but comes at the cost of sacrificing zero-shot performance. To address this challenge, we propose a novel method ZSCL to prevent zero-shot transfer degradation in the continual learning of vision-language models in both feature and parameter space. In the feature space, a reference dataset is introduced for distillation between the current and initial models. The reference dataset should have semantic diversity but no need to be labeled, seen in pre-training, or matched image-text pairs. In parameter space, we prevent a large parameter shift by averaging weights during the training. We propose a more challenging Multi-domain Task Incremental Learning (MTIL) benchmark to evaluate different methods, where tasks are from various domains instead of class-separated in a single dataset. Our method outperforms other methods in the traditional class-incremental learning setting and the MTIL by 9.7% average score. Our code locates at https://github.com/Thunderbeee/ZSCL.

翻译：持续学习（CL）能够帮助预训练的视觉-语言模型高效适应新的或欠训练的数据分布，而无需重新训练。然而，在对对比语言-图像预训练（CLIP）模型进行持续训练时，我们观察到模型的零样本迁移能力会因灾难性遗忘而显著退化。现有的持续学习方法可通过重放先前数据来缓解遗忘。但由于CLIP数据集是私有的，重放方法无法访问预训练数据集。此外，重放先前学习过的下游任务数据虽能提升其性能，但会以牺牲零样本性能为代价。为解决这一挑战，我们提出了一种名为ZSCL的新方法，以在特征空间和参数空间中同时防止视觉-语言模型持续学习中的零样本迁移退化。在特征空间中，我们引入了一个参考数据集用于在当前模型与初始模型之间进行蒸馏。该参考数据集应具有语义多样性，但无需标注、无需出现在预训练中，也无需是匹配的图像-文本对。在参数空间中，我们通过在训练过程中对权重进行平均来防止较大的参数偏移。我们提出了一个更具挑战性的多域任务增量学习（MTIL）基准来评估不同方法，其中任务来自多个不同域，而非单一数据集中的类别划分。在传统的类增量学习设置和MTIL中，我们的方法以平均9.7%的得分超越了其他方法。我们的代码位于 https://github.com/Thunderbeee/ZSCL。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日