Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively small circuit of attention heads that are responsible for propagating information from the context word that guides which of the candidate noun phrases the pronoun ultimately attends to. We then compare how this circuit behaves in a closely matched ``syntactic'' control where the situation model is not strictly necessary. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution.
翻译:对人类语言处理机制的描述长期以来一直诉诸于隐含的“情境模型”,这些模型通过相关的但未明确表述的世界知识来丰富理解过程。在此,我们将因果干预技术应用于最新的Transformer模型,以分析其在Winograd模式挑战(WSC)上的表现——在该挑战中,单个语境线索会改变对歧义代词的解读。我们识别出一个相对较小的注意力头回路,该回路负责从语境词传播信息,从而引导代词最终关注哪个候选名词短语。随后,我们比较了该回路在一种严格匹配的“句法”控制条件(其中情境模型并非严格必要)下的行为。这些分析揭示了构建隐含情境模型以指导代词消解的不同路径。