When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's behavior does not exhibit the same types of failures because human reasoning uses principled and abstract world models. We evaluate human participants and 25 LLMs on their ability to engage in common-sense reasoning about a variety of everyday situations and observe similar patterns of errors in both people and models. We then identify the set of attention heads driving LLM responses and find that these heads implement a form of pattern-matching. These attention heads allow us to predict seemingly inexplicable reasoning errors in people caused by ostensibly irrelevant prompt details. Taken together, our results suggest that everyday causal reasoning in people and LLMs is more consistent with a form of pattern-matching than with abstract world models.
翻译:当大型语言模型(LLMs)在推理中无法泛化或出现随意性错误时,这常被视作LLMs并非真正进行推理,而是执行某种模式匹配的证据。其隐含观点是,人类行为不会表现出相同类型的失败,因为人类推理运用了原则性且抽象的的世界模型。我们评估了人类参与者与25个LLMs在各种日常情境中进行常识推理的能力,并观察到人与模型在错误模式上具有相似性。随后,我们识别出驱动LLM响应的注意力头集合,发现这些注意力头实现了一种模式匹配形式。这些注意力头使我们能够预测人类因看似无关的提示细节而产生的、表面不可解释的推理错误。综合来看,我们的结果表明,人类与LLMs在日常因果推理中更符合模式匹配形式,而非抽象的的世界模型。