Pre-trained language models(PLM) have made impressive results in various NLP tasks. It has been revealed that one of the key factors to their success is the parameters of these models implicitly learn all kinds of knowledge during pre-training. However, encoding knowledge implicitly in the model parameters has two fundamental drawbacks. First, the knowledge is neither editable nor scalable once the model is trained, which is especially problematic in that knowledge is consistently evolving. Second, it lacks interpretability and prevents humans from understanding which knowledge PLM requires for a certain problem. In this paper, we introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM). The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory and leverage knowledge in an explainable manner by knowledge retrieval in the DPM. To justify this design choice, we conduct evaluations in three settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training. (2) knowledge update. PlugLM could absorb new knowledge in a training-free way after pre-training is done. (3) in-task knowledge learning. PlugLM could be further improved by incorporating training samples into DPM with knowledge prompting.
翻译:预训练语言模型(PLM)已在各类自然语言处理任务中取得显著成效。研究表明,其成功的关键因素之一是模型参数在预训练过程中隐式学习了各类知识。然而,将知识隐式编码于模型参数存在两个根本缺陷:首先,一旦模型完成训练,知识便既不可编辑也不可扩展,这在知识持续演进的背景下尤为棘手;其次,这种编码方式缺乏可解释性,阻碍人类理解PLM解决特定问题所需的知识。本文提出PlugLM——一种具有可微即插即用记忆模块(DPM)的预训练模型。其核心思想是通过可编辑且可扩展的键值记忆将知识存储与模型参数解耦,并借助DPM中的知识检索以可解释方式利用知识。为验证该设计选择的合理性,我们在三种场景下开展评估:(1)领域自适应。无需领域内预训练,PlugLM在四个领域上平均获得3.95个F1值的提升。(2)知识更新。完成预训练后,PlugLM能以免训练方式吸收新知识。(3)任务内知识学习。通过将训练样本以知识提示方式融入DPM,PlugLM可得到进一步优化。