Modern language models have the capacity to store and use immense amounts of knowledge about real-world entities, but it remains unclear how to update their implicit "knowledge bases.'' While prior methods for updating knowledge in LMs successfully inject facts, updated LMs then fail to make inferences based on these injected facts. In this work, we demonstrate that a context distillation-based approach can both impart knowledge about entities and propagate that knowledge to enable broader inferences. Our approach consists of two stages: transfer set generation and distillation on the transfer set. We first generate a transfer set by simply prompting a language model to generate a continuation from the entity definition. Then, we update the model parameters so that the distribution of the LM (the student) matches the distribution of the LM conditioned on the definition (the teacher) on the transfer set. Our experiments demonstrate that this approach is more effective in propagating knowledge updates compared to fine-tuning and other gradient-based knowledge-editing methods without compromising performance in other contexts, even when injecting the definitions of up to 150 entities at once.
翻译:现代语言模型具有存储和利用关于现实世界实体海量知识的能力,但如何更新其隐式"知识库"仍不明确。尽管先前的语言模型知识更新方法能够成功注入事实,但更新后的语言模型却无法基于这些注入事实进行推理。本研究中,我们证明基于上下文蒸馏的方法既能传授关于实体的知识,又能传播该知识以实现更广泛的推理。该方法包含两个阶段:迁移集生成和基于迁移集的蒸馏。首先,我们通过简单提示语言模型从实体定义生成续写来创建迁移集。随后更新模型参数,使语言模型(学生)在迁移集上的分布与以定义为条件的语言模型(教师)的分布相匹配。实验表明,与微调及其他基于梯度的知识编辑方法相比,本方法在传播知识更新方面更为有效,且不会损害其他上下文中的性能——即使在一次性注入多达150个实体的定义时亦是如此。