Lifelong learning (LL) is an important ability for NLP models to learn new tasks continuously. Architecture-based approaches are reported to be effective implementations for LL models. However, it is non-trivial to extend previous approaches to domain incremental LL scenarios since they either require access to task identities in the testing phase or cannot handle samples from unseen tasks. In this paper, we propose \textbf{Diana}: a \underline{d}ynam\underline{i}c \underline{a}rchitecture-based lifelo\underline{n}g le\underline{a}rning model that tries to learn a sequence of tasks with a prompt-enhanced language model. Four types of hierarchically organized prompts are used in Diana to capture knowledge from different granularities. Specifically, we dedicate task-level prompts to capture task-specific knowledge to retain high LL performances and maintain instance-level prompts to learn knowledge shared across input samples to improve the model's generalization performance. Moreover, we dedicate separate prompts to explicitly model unseen tasks and introduce a set of prompt key vectors to facilitate knowledge sharing between tasks. Extensive experiments demonstrate that Diana outperforms state-of-the-art LL models, especially in handling unseen tasks. We release the code and data at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/diana}.
翻译:终身学习(LL)是自然语言处理模型持续学习新任务的重要能力。基于架构的方法已被报道为LL模型的有效实现方式。然而,将现有方法扩展到领域增量LL场景并非易事,因为这类方法要么在测试阶段需要任务标识,要么无法处理未见过的任务样本。本文提出\textbf{Diana}:一种基于动态架构的终身学习模型,尝试通过提示增强语言模型学习一系列任务。该模型使用四种层次化组织的提示来捕获不同粒度的知识。具体而言,我们采用任务级提示来捕获任务特定知识以保持高LL性能,并维护实例级提示来学习跨输入样本共享的知识以提升模型泛化能力。此外,我们设计独立提示显式建模未见任务,并引入一组提示键向量以促进任务间的知识共享。大量实验表明,Diana在应对未见任务方面表现优于现有最优LL模型。我们已在\url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/diana}开源代码与数据。