The continual learning (CL) ability is vital for deploying large language models (LLMs) in the dynamic world. Existing methods devise the learning module to acquire task-specific knowledge with parameter-efficient tuning (PET) block and the selection module to pick out the corresponding one for the testing input, aiming at handling the challenges of catastrophic forgetting and knowledge transfer in CL. However, these methods tend to address only one of the challenges, ignoring the potential of aligning the two modules to effectively address catastrophic forgetting and knowledge transfer simultaneously. To this end, we propose a novel Shared Attention Framework (SAPT), to align the PET learning and selection via the Shared Attentive Learning \& Selection module. Extensive Experiments on two CL benchmarks demonstrate the superiority of SAPT. Moreover, SAPT consistently demonstrates its superiority when we scale it to different model sizes (from 770M to 13B), different model architectures (T5 and LLaMA-2) and unseen tasks.
翻译:持续学习能力对于在动态世界中部署大型语言模型至关重要。现有方法通过参数高效微调模块构建任务特定知识的学习模块,并设计选择模块以从训练输入中筛选出对应任务知识,旨在应对持续学习中的灾难性遗忘与知识迁移两大挑战。然而,这些方法往往仅解决其中单一挑战,忽略了通过对齐两个模块来同时有效应对灾难性遗忘与知识迁移的可能性。为此,我们提出新型共享注意力框架,通过共享注意力学习与选择模块实现参数高效微调学习与选择的协同对齐。在两个持续学习基准上的大量实验证明了SAPT的优越性。此外,当我们将SAPT扩展到不同模型规模(770M至13B参数)、不同模型架构(T5与LLaMA-2)及未见任务时,其性能优势始终保持一致。