ConPET: Continual Parameter-Efficient Tuning for Large Language Models

from arxiv, 12 pages, 3 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Continual learning necessitates the continual adaptation of models to newly emerging tasks while minimizing the catastrophic forgetting of old ones. This is extremely challenging for large language models (LLMs) with vanilla full-parameter tuning due to high computation costs, memory consumption, and forgetting issue. Inspired by the success of parameter-efficient tuning (PET), we propose Continual Parameter-Efficient Tuning (ConPET), a generalizable paradigm for continual task adaptation of LLMs with task-number-independent training complexity. ConPET includes two versions with different application scenarios. First, Static ConPET can adapt former continual learning methods originally designed for relatively smaller models to LLMs through PET and a dynamic replay strategy, which largely reduces the tuning costs and alleviates the over-fitting and forgetting issue. Furthermore, to maintain scalability, Dynamic ConPET adopts separate PET modules for different tasks and a PET module selector for dynamic optimal selection. In our extensive experiments, the adaptation of Static ConPET helps multiple former methods reduce the scale of tunable parameters by over 3,000 times and surpass the PET-only baseline by at least 5 points on five smaller benchmarks, while Dynamic ConPET gains its advantage on the largest dataset. The codes and datasets are available at https://github.com/Raincleared-Song/ConPET.

翻译：持续学习要求模型在不断适应新出现任务的同时，尽量减少对旧任务的灾难性遗忘。对于采用传统全参数微调的大型语言模型（LLMs）而言，由于高昂的计算成本、内存消耗以及遗忘问题，这一挑战极为严峻。受参数高效微调（PET）成功经验的启发，我们提出了持续参数高效微调（ConPET）范式——一种可泛化的LLMs持续任务适应方法，其训练复杂度与任务数量无关。ConPET包含两种适用于不同场景的版本。首先，静态ConPET通过PET技术和动态重放策略，将原本针对较小模型设计的传统持续学习方法适配至LLMs，大幅降低了微调成本并缓解了过拟合与遗忘问题。其次，为保持可扩展性，动态ConPET为不同任务采用独立的PET模块，并配备PET模块选择器以进行动态最优选择。在广泛实验中，静态ConPET的适配使多种传统方法将可调参数规模缩减超3000倍，并在五个较小基准测试中超越仅使用PET的基线至少5个百分点；而动态ConPET在最大数据集上展现出其优势。代码与数据集请访问：https://github.com/Raincleared-Song/ConPET。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日