Pre-trained language models(PLM) have made impressive results in various NLP tasks. It has been revealed that one of the key factors to their success is the parameters of these models implicitly learn all kinds of knowledge during pre-training. However, encoding knowledge implicitly in the model parameters has two fundamental drawbacks. First, the knowledge is neither editable nor scalable once the model is trained, which is especially problematic in that knowledge is consistently evolving. Second, it lacks interpretability and prevents humans from understanding which knowledge PLM requires for a certain problem. In this paper, we introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM). The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory and leverage knowledge in an explainable manner by knowledge retrieval in the DPM. To justify this design choice, we conduct evaluations in three settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training. (2) knowledge update. PlugLM could absorb new knowledge in a training-free way after pre-training is done. (3) in-task knowledge learning. PlugLM could be further improved by incorporating training samples into DPM with knowledge prompting.
翻译:预训练语言模型(PLM)已在多种自然语言处理任务中取得显著成果。研究表明,其成功的关键因素之一在于这些模型的参数在预训练过程中隐式学习了各类知识。然而,将知识隐式编码于模型参数中存在两个根本性缺陷。首先,一旦模型训练完成,知识既不可编辑也不可扩展,这在知识持续演进的情况下尤为棘手。其次,这种做法缺乏可解释性,阻碍了人类理解PLM针对特定问题所需的具体知识。本文提出PlugLM——一种具有可微分即插即用内存(DPM)的预训练模型。其核心思想是通过可编辑、可扩展的键值内存将知识存储与模型参数解耦,并借助DPM中的知识检索以可解释的方式利用知识。为验证这一设计,我们在三种设置下进行了评估:(1)领域适应——无需任何域内预训练,PlugLM在四个领域的平均F1值提升3.95;(2)知识更新——预训练完成后,PlugLM能以无需额外训练的方式吸收新知识;(3)任务内知识学习——通过结合知识提示将训练样本纳入DPM,可进一步提升PlugLM性能。