QA models with lifelong learning (LL) abilities are important for practical QA applications, and architecture-based LL methods are reported to be an effective implementation for these models. However, it is non-trivial to extend previous approaches to QA tasks since they either require access to task identities in the testing phase or do not explicitly model samples from unseen tasks. In this paper, we propose Diana: a dynamic architecture-based lifelong QA model that tries to learn a sequence of QA tasks with a prompt enhanced language model. Four types of hierarchically organized prompts are used in Diana to capture QA knowledge from different granularities. Specifically, we dedicate task-level prompts to capture task-specific knowledge to retain high LL performances and maintain instance-level prompts to learn knowledge shared across different input samples to improve the model's generalization performance. Moreover, we dedicate separate prompts to explicitly model unseen tasks and introduce a set of prompt key vectors to facilitate knowledge sharing between tasks. Extensive experiments demonstrate that Diana outperforms state-of-the-art lifelong QA models, especially in handling unseen tasks.
翻译:具备持续学习能力的问答模型对实际应用至关重要,基于架构的持续学习方法被认为是实现此类模型的有效途径。然而,将已有方法迁移至问答任务存在显著挑战,因为现有方法要么需要在测试阶段获取任务标识,要么未能显式建模未见任务中的样本。本文提出Diana——一种基于动态架构的持续问答模型,该模型通过提示增强语言模型学习序列化问答任务。Diana采用四种层次化组织的提示结构,从不同粒度捕获问答知识。具体而言,我们分配任务级提示以捕获任务特定知识,从而保持高持续学习性能;同时维护实例级提示以学习不同输入样本间的共享知识,提升模型泛化能力。此外,我们设置独立提示显式建模未见任务,并引入提示键向量集合促进任务间知识共享。大量实验表明,Diana在性能上超越现有最先进的持续问答模型,尤其在处理未见任务时表现突出。