Neural language models (LMs) represent facts about the world described by text. Sometimes these facts derive from training data (in most LMs, a representation of the word banana encodes the fact that bananas are fruits). Sometimes facts derive from input text itself (a representation of the sentence "I poured out the bottle" encodes the fact that the bottle became empty). Tools for inspecting and modifying LM fact representations would be useful almost everywhere LMs are used: making it possible to update them when the world changes, to localize and remove sources of bias, and to identify errors in generated text. We describe REMEDI, an approach for querying and modifying factual knowledge in LMs. REMEDI learns a map from textual queries to fact encodings in an LM's internal representation system. These encodings can be used as knowledge editors: by adding them to LM hidden representations, we can modify downstream generation to be consistent with new facts. REMEDI encodings can also be used as model probes: by comparing them to LM representations, we can ascertain what properties LMs attribute to mentioned entities, and predict when they will generate outputs that conflict with background knowledge or input text. REMEDI thus links work on probing, prompting, and model editing, and offers steps toward general tools for fine-grained inspection and control of knowledge in LMs.
翻译:神经语言模型(LM)能够表征文本所描述的世界事实。这些事实有时源自训练数据(在大多数LM中,"香蕉"一词的编码蕴含了香蕉是水果这一事实),有时则源于输入文本本身("我倒空了瓶子"这句话的表征编码了瓶子变空的事实)。用于检查与修改LM事实表征的工具,在LM应用的几乎所有场景中都具有重要价值:可在世界变化时更新模型知识,可定位并消除偏见来源,还可识别生成文本中的错误。本文提出REMEDI方法,用于查询与修改LM中的事实知识。REMEDI学习从文本查询到LM内部表征系统中事实编码的映射。这些编码可作为知识编辑器使用:通过将其添加到LM隐层表征中,我们能使下游生成结果与新事实保持一致。REMEDI编码还可作为模型探针使用:通过与LM表征进行对比,我们能够确定LM赋予提及实体的属性,并预测其何时会生成与背景知识或输入文本冲突的输出。因此,REMEDI衔接了探针分析、提示工程与模型编辑领域的研究,为实现对LM知识的细粒度检查与控制提供了通用工具的新路径。