Pre-trained language models (LMs) are used for knowledge intensive tasks like question answering, but their knowledge gets continuously outdated as the world changes. Prior work has studied targeted updates to LMs, injecting individual facts and evaluating whether the model learns these facts while not changing predictions on other contexts. We take a step forward and study LMs' abilities to make inferences based on injected facts (or propagate those facts): for example, after learning that something is a TV show, does an LM predict that you can watch it? We study this with two cloze-style tasks: an existing dataset of real-world sentences about novel entities (ECBD) as well as a new controlled benchmark with manually designed templates requiring varying levels of inference about injected knowledge. Surprisingly, we find that existing methods for updating knowledge (gradient-based fine-tuning and modifications of this approach) show little propagation of injected knowledge. These methods improve performance on cloze instances only when there is lexical overlap between injected facts and target inferences. Yet, prepending entity definitions in an LM's context improves performance across all settings, suggesting that there is substantial headroom for parameter-updating approaches for knowledge injection.
翻译:预训练语言模型(LMs)被用于问答等知识密集型任务,但其知识会随着世界变化而持续过时。先前研究聚焦于对语言模型进行定向更新,通过注入单个事实并评估模型是否能在不改变其他上下文预测结果的情况下学习这些事实。本研究向前推进一步,探究语言模型基于注入事实进行推断(即传播事实)的能力:例如,当模型获知某物为电视节目后,是否能预测出"可以观看它"?我们通过两项完形填空任务展开研究:一项是包含新颖实体真实语句的现有数据集(ECBD),另一项是采用人工设计模板的新控制性基准测试,这些模板要求对注入知识进行不同层次的推理。令人意外的是,我们发现现有知识更新方法(基于梯度的微调及其改进方法)几乎无法实现注入知识的传播。这些方法仅在注入事实与目标推断存在词汇重叠时,才能提升完形填空实例的表现。然而,在语言模型的上下文中前置实体定义的方法在所有场景中均能提升性能,这表明参数更新型知识注入方法仍存在显著提升空间。