Language models retain a significant amount of world knowledge from their pre-training stage. This allows knowledgeable models to be applied to knowledge-intensive tasks prevalent in information retrieval, such as ranking or question answering. Understanding how and which factual information is acquired by our models is necessary to build responsible models. However, limited work has been done to understand the effect of pre-training tasks on the amount of knowledge captured and forgotten by language models during pre-training. Building a better understanding of knowledge acquisition is the goal of this paper. Therefore, we utilize a selection of pre-training tasks to infuse knowledge into our model. In the following steps, we test the model's knowledge retention by measuring its ability to answer factual questions. Our experiments show that masking entities and principled masking of correlated spans based on pointwise mutual information lead to more factual knowledge being retained than masking random tokens. Our findings demonstrate that, like the ability to perform a task, the (factual) knowledge acquired from being trained on that task is forgotten when a model is trained to perform another task (catastrophic forgetting) and how to prevent this phenomenon. To foster reproducibility, the code, as well as the data used in this paper, are openly available.
翻译:语言模型在预训练阶段保留了大量世界知识,这使得知识型模型能够应用于信息检索中常见的知识密集型任务,如排序或问答。理解模型如何获取以及获取了哪些事实信息,对于构建负责任的模型至关重要。然而,关于预训练任务对语言模型在预训练过程中捕获和遗忘知识量的影响,目前研究有限。本文旨在增进对知识获取过程的理解。为此,我们采用一系列预训练任务向模型注入知识,随后通过测试模型回答事实性问题的能力来评估其知识保留程度。实验表明,与随机掩码相比,对实体进行掩码以及基于点互信息对相关跨度进行原则性掩码,能使模型保留更多事实知识。我们的发现证明,如同执行任务的能力一样,当模型被训练执行另一任务时,从训练该任务中获得的(事实性)知识会被遗忘(灾难性遗忘),并阐明了如何预防这一现象。为促进可重复性,本文所使用的代码及数据均公开提供。