KEPLMs are pre-trained models that utilize external knowledge to enhance language understanding. Previous language models facilitated knowledge acquisition by incorporating knowledge-related pre-training tasks learned from relation triples in knowledge graphs. However, these models do not prioritize learning embeddings for entity-related tokens. Moreover, updating the entire set of parameters in KEPLMs is computationally demanding. This paper introduces TRELM, a Robust and Efficient Pre-training framework for Knowledge-Enhanced Language Models. We observe that entities in text corpora usually follow the long-tail distribution, where the representations of some entities are suboptimally optimized and hinder the pre-training process for KEPLMs. To tackle this, we employ a robust approach to inject knowledge triples and employ a knowledge-augmented memory bank to capture valuable information. Furthermore, updating a small subset of neurons in the feed-forward networks (FFNs) that store factual knowledge is both sufficient and efficient. Specifically, we utilize dynamic knowledge routing to identify knowledge paths in FFNs and selectively update parameters during pre-training. Experimental results show that TRELM reduces pre-training time by at least 50% and outperforms other KEPLMs in knowledge probing tasks and multiple knowledge-aware language understanding tasks.
翻译:知识增强预训练语言模型(KEPLMs)是借助外部知识来提升语言理解能力的预训练模型。先前的语言模型通过引入基于知识图谱关系三元组的知识相关预训练任务来促进知识获取。然而,这些模型并未优先学习实体相关词元的嵌入表示。此外,更新KEPLMs的全部参数在计算上非常耗时。本文提出TRELM,一种面向知识增强语言模型的鲁棒且高效的预训练框架。我们观察到文本语料中的实体通常遵循长尾分布,部分实体的表示未能得到充分优化,从而阻碍了KEPLMs的预训练过程。为解决此问题,我们采用鲁棒方法注入知识三元组,并利用知识增强型记忆库捕获有价值信息。此外,更新存储事实知识的前馈网络(FFNs)中一小部分神经元既充分又高效。具体而言,我们利用动态知识路由识别FFNs中的知识路径,并在预训练过程中选择性更新参数。实验结果表明,TRELM将预训练时间减少至少50%,并在知识探测任务及多项知识感知语言理解任务中优于其他KEPLMs。