Reasoning as Pattern Matching: Shared Mechanisms in Human and LLM Everyday Reasoning

When large language models (LLMs) fail to generalize or make haphazard errors in reasoning, it is often taken as evidence that LLMs are not truly reasoning, but rather performing a kind of pattern matching. The implication is that people's behavior does not exhibit the same types of failures because human reasoning uses principled and abstract world models. We evaluate human participants and 25 LLMs on their ability to engage in common-sense reasoning about a variety of everyday situations and observe similar patterns of errors in both people and models. We then identify the set of attention heads driving LLM responses and find that these heads implement a form of pattern-matching. These attention heads allow us to predict seemingly inexplicable reasoning errors in people caused by ostensibly irrelevant prompt details. Taken together, our results suggest that everyday causal reasoning in people and LLMs is more consistent with a form of pattern-matching than with abstract world models.

翻译：当大型语言模型在推理中出现泛化失败或偶然错误时，这常被视作其并非真正进行推理、而仅执行某种模式匹配的证据。其隐含之意在于，人类的推理行为不会呈现同类失败，因为人类推理基于原则性的抽象世界模型。我们评估了人类参与者与25种大型语言模型在各类日常情境中进行常识推理的能力，发现人类与模型均呈现相似错误模式。通过识别驱动大型语言模型响应的注意力头集合，我们发现这些注意力头实现了一种模式匹配形式。这些注意力头使我们能够预测因表面上无关的提示细节导致的人类看似不可解释的推理错误。综合而言，我们的研究结果表明，人类与大型语言模型在日常因果推理中更契合模式匹配形式，而非抽象世界模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大语言模型溯因推理的统一分类学与综述

专知会员服务

16+阅读 · 4月12日

大语言模型的智能体化推理

专知会员服务

35+阅读 · 1月21日

【NeurIPS2025】语言模型是高效的推理者吗？——来自逻辑编程的视角

专知会员服务

17+阅读 · 2025年11月3日

如何提升大模型通用推理能力？DeepSeek最新论文《CODEI/O：通过代码输入输出预测凝练推理模式》

专知会员服务

42+阅读 · 2025年2月16日