Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning

Continual learning (CL) aims to continually accumulate knowledge from a non-stationary data stream without catastrophic forgetting of learned knowledge, requiring a balance between stability and adaptability. Relying on the generalizable representation in pre-trained models (PTMs), PTM-based CL methods perform effective continual adaptation on downstream tasks by adding learnable adapters or prompts upon the frozen PTMs. However, many existing PTM-based CL methods use restricted adaptation on a fixed set of these modules to avoid forgetting, suffering from limited CL ability. Periodically adding task-specific modules results in linear model growth rate and impaired knowledge reuse. We propose Self-Expansion of pre-trained models with Modularized Adaptation (SEMA), a novel approach to enhance the control of stability-plasticity balance in PTM-based CL. SEMA automatically decides to reuse or add adapter modules on demand in CL, depending on whether significant distribution shift that cannot be handled is detected at different representation levels. We design modular adapter consisting of a functional adapter and a representation descriptor. The representation descriptors are trained as a distribution shift indicator and used to trigger self-expansion signals. For better composing the adapters, an expandable weighting router is learned jointly for mixture of adapter outputs. SEMA enables better knowledge reuse and sub-linear expansion rate. Extensive experiments demonstrate the effectiveness of the proposed self-expansion method, achieving state-of-the-art performance compared to PTM-based CL methods without memory rehearsal. Code is available at https://github.com/huiyiwang01/SEMA-CL.

翻译：持续学习（Continual Learning, CL）旨在从非平稳数据流中持续积累知识，同时避免对已学知识的灾难性遗忘，这需要在稳定性与适应性之间取得平衡。基于预训练模型（Pre-trained Models, PTMs）的CL方法利用PTM中泛化性强的表征，通过在冻结的PTM上添加可学习的适配器（adapters）或提示（prompts）来对下游任务进行有效的持续适应。然而，现有许多基于PTM的CL方法为避免遗忘，通常仅对一组固定的模块进行受限的适应，导致其持续学习能力有限。周期性添加任务专用模块会导致模型规模线性增长并损害知识重用。本文提出一种基于模块化适配的预训练模型自扩展方法（Self-Expansion of pre-trained models with Modularized Adaptation, SEMA），以增强基于PTM的CL中稳定性-可塑性平衡的控制能力。SEMA根据在不同表征层次上检测到的、无法处理的显著分布偏移，自动决定在CL过程中重用现有适配器模块或添加新模块。我们设计了由功能适配器与表征描述符组成的模块化适配器，其中表征描述符被训练为分布偏移指示器，用于触发自扩展信号。为更好地组合适配器，我们联合学习了一个可扩展的加权路由器，用于混合各适配器的输出。SEMA实现了更优的知识重用与亚线性扩展速率。大量实验证明了所提自扩展方法的有效性，在不进行记忆回放的情况下，其性能优于现有基于PTM的CL方法，达到当前最优水平。代码发布于 https://github.com/huiyiwang01/SEMA-CL。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日