Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion

Previous interpretations of language models (LMs) miss important distinctions in how these models process factual information. For example, given the query "Astrid Lindgren was born in" with the corresponding completion "Sweden", no difference is made between whether the prediction was based on having the exact knowledge of the birthplace of the Swedish author or assuming that a person with a Swedish-sounding name was born in Sweden. In this paper, we investigate four different prediction scenarios for which the LM can be expected to show distinct behaviors. These scenarios correspond to different levels of model reliability and types of information being processed - some being less desirable for factual predictions. To facilitate precise interpretations of LMs for fact completion, we propose a model-specific recipe called PrISM for constructing datasets with examples of each scenario based on a set of diagnostic criteria. We apply a popular interpretability method, causal tracing (CT), to the four prediction scenarios and find that while CT produces different results for each scenario, aggregations over a set of mixed examples may only represent the results from the scenario with the strongest measured signal. In summary, we contribute tools for a more granular study of fact completion in language models and analyses that provide a more nuanced understanding of how LMs process fact-related queries.

翻译：先前对语言模型（LMs）的解释忽略了这些模型在处理事实信息时的重要区别。例如，给定查询“阿斯特丽德·林格伦出生于”及相应补全“瑞典”，现有方法无法区分该预测是基于对这位瑞典作家出生地的确切知识，还是仅基于一个听起来像瑞典名字的人出生在瑞典的假设。本文研究了语言模型可能表现出不同行为的四种预测场景。这些场景对应着不同的模型可靠性水平及信息处理类型——其中某些类型对于事实预测而言是不理想的。为了促进对语言模型事实补全的精确解释，我们提出一种名为PrISM的模型特定方法，用于基于一组诊断标准构建包含各场景示例的数据集。我们将流行的可解释性方法——因果追踪（CT）应用于这四种预测场景，发现虽然CT对每个场景产生不同的结果，但对一组混合示例的聚合可能仅代表测量信号最强场景的结果。总之，我们贡献了用于更精细研究语言模型中事实补全的工具，并通过分析提供了对语言模型如何处理事实相关查询的更细致理解。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/