Previous studies have revealed that vanilla pre-trained language models (PLMs) lack the capacity to handle knowledge-intensive NLP tasks alone; thus, several works have attempted to integrate external knowledge into PLMs. However, despite the promising outcome, we empirically observe that PLMs may have already encoded rich knowledge in their pre-trained parameters but fails to fully utilize them when applying to knowledge-intensive tasks. In this paper, we propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize those related latent knowledge without retrieving them from the external corpus. By simply adding a prompt like ``As far as I know'' to the PLMs, we try to review related latent knowledge and inject them back to the model for knowledge consolidation. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, GPT-3 and OPT. Experimental results on six commonsense reasoning tasks and GLUE benchmarks demonstrate the effectiveness of our proposed approach, which further proves that the knowledge stored in PLMs can be better exploited to enhance the downstream performance. Code will be available in https://github.com/zjunlp/knowledge-rumination.
翻译:先前研究表明,原始预训练语言模型(PLMs)缺乏独立处理知识密集型自然语言处理任务的能力,因此多项研究尝试将外部知识融入预训练语言模型。然而,尽管取得了令人期待的结果,我们通过实证观察发现,预训练语言模型其实已在预训练参数中编码了丰富知识,但在应用于知识密集型任务时却未能充分利用这些知识。本文提出了一种名为"知识反刍"的新范式,旨在帮助预训练语言模型利用相关潜在知识,而无需从外部语料库中检索。我们通过在预训练语言模型中简单添加"据我所知"等提示语,尝试回顾相关潜在知识并将其重新注入模型以强化知识。我们将所提出的知识反刍方法应用于包括RoBERTa、DeBERTa、GPT-3和OPT在内的多种语言模型。在六个常识推理任务和GLUE基准上的实验结果表明,我们提出的方法具有有效性,进一步证明了预训练语言模型中存储的知识可以被更好地挖掘以提升下游任务性能。代码将在https://github.com/zjunlp/knowledge-rumination 开源。