SwitchCIT: Switching for Continual Instruction Tuning

Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains, particularly in general language understanding and visual reasoning. However, these models, trained on massive data, may not be finely optimized for specific tasks triggered by instructions. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains, ensuring their effectiveness and relevance across a wide range of applications. In the context of continual instruction tuning, where models are sequentially trained on different tasks, catastrophic forgetting can occur, leading to performance degradation on previously learned tasks. This work addresses the catastrophic forgetting in continual instruction learning through a switching mechanism for routing computations to parameter-efficient tuned models. We demonstrate the effectiveness of our method through experiments on continual instruction tuning of different natural language generation tasks and vision-language tasks. We also showcase the advantages of our proposed method in terms of efficiency, scalability, portability, and privacy preservation.

翻译：大型语言模型（LLM）与多模态模型（MM）在诸多领域，尤其在通用语言理解与视觉推理方面，已展现出卓越的能力。然而，这些基于海量数据训练的模型，可能并未针对指令触发的特定任务进行精细优化。持续指令调优对于使大模型适应不断演化的任务与领域至关重要，可确保其在广泛应用中的有效性与相关性。在持续指令调优的场景中，模型需按顺序在不同任务上进行训练，这可能导致灾难性遗忘，从而降低模型在已学习任务上的性能。本研究通过一种切换机制，将计算路由至参数高效调优的模型，以解决持续指令学习中的灾难性遗忘问题。我们在不同自然语言生成任务与视觉-语言任务的持续指令调优实验中验证了所提方法的有效性。同时，我们展示了该方法在效率、可扩展性、可移植性及隐私保护方面的优势。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日