When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's behavior does not exhibit the same types of failures because human reasoning uses principled and abstract world models. We evaluate human participants and 25 LLMs on their ability to engage in common-sense reasoning about a variety of everyday situations and observe similar patterns of errors in both people and models. We then identify the set of attention heads driving LLM responses and find that these heads implement a form of pattern-matching. These attention heads allow us to predict seemingly inexplicable reasoning errors in people caused by ostensibly irrelevant prompt details. Taken together, our results suggest that everyday causal reasoning in people and LLMs is more consistent with a form of pattern-matching than with abstract world models.
翻译:当大型语言模型在推理中出现泛化失败或偶然错误时,这常被视作其并非真正进行推理、而仅执行某种模式匹配的证据。其隐含之意在于,人类的推理行为不会呈现同类失败,因为人类推理基于原则性的抽象世界模型。我们评估了人类参与者与25种大型语言模型在各类日常情境中进行常识推理的能力,发现人类与模型均呈现相似错误模式。通过识别驱动大型语言模型响应的注意力头集合,我们发现这些注意力头实现了一种模式匹配形式。这些注意力头使我们能够预测因表面上无关的提示细节导致的人类看似不可解释的推理错误。综合而言,我们的研究结果表明,人类与大型语言模型在日常因果推理中更契合模式匹配形式,而非抽象世界模型。