Characterizing Mechanisms for Factual Recall in Language Models

Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., "The capital of Poland is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia and GPT2, the training frequency of both the query country ("Poland") and the in-context city ("London") highly affect the models' likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88\% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime.

翻译：语言模型（LM）通常需要将预训练中记忆的事实与给定上下文中出现的新信息进行整合。这两种来源可能相互矛盾，导致模型内部产生竞争，且目前尚不明确LM将如何解决这种冲突。在一个查询世界首都知识的基准数据集上，我们研究了此类情境下LM行为的分布性和机制性决定因素。具体而言，我们测量了LM使用反事实前缀（例如"波兰的首都是伦敦"）覆盖其预训练知识（"华沙"）的时间比例。在Pythia和GPT2模型中，查询国家（"波兰"）和上下文城市（"伦敦"）的训练频率显著影响模型采用反事实表述的可能性。随后，我们利用头部归因方法识别出在logits中分别促进记忆答案或上下文答案的个体注意力头。通过缩放这些注意力头的值向量，我们可控制模型在新数据上采用上下文答案的可能性。该方法仅需在运行时缩放单个头部，即可将生成上下文答案的概率提升至88%。本研究为"模型行为可定位于特定组件"这一证据体系做出贡献，并为未来方法在运行时动态控制模型行为提供了概念验证。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日