Neural language models (LMs) represent facts about the world described by text. Sometimes these facts derive from training data (in most LMs, a representation of the word "banana" encodes the fact that bananas are fruits). Sometimes facts derive from input text itself (a representation of the sentence "I poured out the bottle" encodes the fact that the bottle became empty). We describe REMEDI, a method for learning to map statements in natural language to fact encodings in an LM's internal representation system. REMEDI encodings can be used as knowledge editors: when added to LM hidden representations, they modify downstream generation to be consistent with new facts. REMEDI encodings may also be used as probes: when compared to LM representations, they reveal which properties LMs already attribute to mentioned entities, in some cases making it possible to predict when LMs will generate outputs that conflict with background knowledge or input text. REMEDI thus links work on probing, prompting, and LM editing, and offers steps toward general tools for fine-grained inspection and control of knowledge in LMs.
翻译:神经语言模型能够表征文本所描述的世界中的事实。有时这些事实来源于训练数据(在大多数语言模型中,“banana”这个词的表征编码了“香蕉是水果”这一事实)。有时事实来源于输入文本本身(“我倒空了瓶子”这句话的表征编码了“瓶子变空”这一事实)。我们提出了REMEDI方法,这是一种学习如何将自然语言陈述映射到语言模型内部表征系统中事实编码的方法。REMEDI编码可用作知识编辑器:当添加到语言模型的隐藏层表征时,它们能修改后续生成内容,使其与新事实保持一致。REMEDI编码也可用作探针:当与语言模型的表征进行比较时,它们揭示了语言模型已经赋予提及实体的哪些属性,在某些情况下能够预测语言模型何时会生成与背景知识或输入文本相冲突的输出。因此,REMEDI将探测、提示和语言模型编辑方面的工作联系起来,并为实现对语言模型知识的细粒度检查与控制提供了通用工具的初步探索。