Masked language modeling (MLM) plays a key role in pretraining large language models. But the MLM objective is often dominated by high-frequency words that are sub-optimal for learning factual knowledge. In this work, we propose an approach for influencing MLM pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks. We force the language model to prioritize informative words in a fully unsupervised way. Experiments demonstrate that the proposed approach can significantly improve the performance of pretrained language models on tasks such as factual recall, question answering, sentiment analysis, and natural language inference in a closed-book setting.
翻译:掩码语言建模(MLM)在预训练大型语言模型中发挥着关键作用,但MLM目标通常由高频词主导,这类词汇不利于学习事实知识。本研究提出一种方法,通过影响MLM预训练过程来提升语言模型在多种知识密集型任务上的表现。我们以完全无监督的方式强制语言模型优先关注信息性词汇。实验表明,该方法能显著提升预训练语言模型在闭书场景下的事实回忆、问答、情感分析和自然语言推理等任务中的性能。