Transformer-based language models (LMs) are known to capture factual knowledge in their parameters. While previous work looked into where factual associations are stored, only little is known about how they are retrieved internally during inference. We investigate this question through the lens of information flow. Given a subject-relation query, we study how the model aggregates information about the subject and relation to predict the correct attribute. With interventions on attention edges, we first identify two critical points where information propagates to the prediction: one from the relation positions followed by another from the subject positions. Next, by analyzing the information at these points, we unveil a three-step internal mechanism for attribute extraction. First, the representation at the last-subject position goes through an enrichment process, driven by the early MLP sublayers, to encode many subject-related attributes. Second, information from the relation propagates to the prediction. Third, the prediction representation "queries" the enriched subject to extract the attribute. Perhaps surprisingly, this extraction is typically done via attention heads, which often encode subject-attribute mappings in their parameters. Overall, our findings introduce a comprehensive view of how factual associations are stored and extracted internally in LMs, facilitating future research on knowledge localization and editing.
翻译:基于Transformer的语言模型(LMs)以其参数中蕴含的事实知识而闻名。尽管先前的研究侧重于探究事实关联的存储位置,但对于模型在推理过程中如何内部检索这些知识仍知之甚少。我们通过信息流的视角研究这一问题。针对主语-关系查询,我们分析模型如何整合主语与关系信息以预测正确属性。通过对注意力边界的干预,我们首先识别出信息传播至预测结果的两个关键节点:一个来自关系位置,另一个随后来自主语位置。接着,通过分析这些节点的信息,我们揭示了属性提取的三步内部机制:首先,受早期MLP子层驱动,最后一个主语位置的表征经历富集过程,编码大量与主语相关的属性;其次,来自关系的信息传播至预测结果;最后,预测表征"查询"富集后的主语以提取属性。令人意外的是,这一提取通常通过注意力头完成,这些注意力头常在其参数中编码主语-属性映射。总体而言,我们的发现揭示了语言模型内部事实关联的存储与提取机制的全景视角,为知识定位与编辑的未来研究提供了支持。