DAPT: A Dual Attention Framework for Parameter-Efficient Continual Learning of Large Language Models

The continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world. Based on parameter-efficient tuning (PET), existing methods devise the learning module and the selection module to handle the challenges of catastrophic forgetting (CF) and knowledge transfer (KT) in CL. The learning module allocates separate PET blocks for each continually emerged task and the selection module function to choose the correct one for the input at testing time. However, there are limitations in their deigns of both modules and they ignore the potential of aligning the two module to address CF and KT simultaneously. To this end, we propose a novel Dual Attention Framework , to align the PET learning and selection via the Dual Attentive Learning\&Selection module. Extensive Experiments on two CL benchmarks demonstrate the superiority of DAPT to resist CF and facilitate KT at the same time. Moreover, DAPT exhibits the superiority when we scale it to different model sizes (from 770M to 11B) and unseen tasks.

翻译：持续学习能力对于在动态环境中部署大语言模型至关重要。基于参数高效微调的现有方法，通过设计学习模块和选择模块来应对持续学习中的灾难性遗忘和知识迁移挑战。学习模块为每个持续涌现的任务分配独立的参数高效微调模块，而选择模块负责在测试时为输入选择正确的模块。然而，这两个模块的设计存在局限性，且未能充分利用二者协同处理灾难性遗忘与知识迁移的潜力。为此，我们提出一种新颖的双重注意力框架，通过双重注意力学习与选择模块实现参数高效微调学习与选择的对齐。在两个持续学习基准上的大量实验表明，DAPT在同时抵抗灾难性遗忘和促进知识迁移方面具有优越性。此外，当将其扩展至不同模型规模（从770M到11B参数）及未见任务时，DAPT展现出显著优势。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日