Do Large Language Models (LLMs) possess a Theory of Mind (ToM)? Research into this question has focused on evaluating LLMs against benchmarks and found success across a range of social tasks. However, these evaluations do not test for the actual representations posited by ToM: namely, a causal model of mental states and behavior. Here, we use a cognitively-grounded definition of ToM to develop and test a new evaluation framework. Specifically, our approach probes whether LLMs have a coherent, domain-general, and consistent model of how mental states cause behavior -- regardless of whether that model matches a human-like ToM. We find that even though LLMs succeed in approximating human judgments in a simple ToM paradigm, they fail at a logically equivalent task and exhibit low consistency between their action predictions and corresponding mental state inferences. As such, these findings suggest that the social proficiency exhibited by LLMs is not the result of a domain-general or consistent ToM.
翻译:大型语言模型是否具备心智理论?针对这一问题的研究主要集中于将LLMs与基准测试进行评估,并发现其在多种社会任务中表现成功。然而,这些评估并未检验心智理论所假设的实际表征:即心理状态与行为的因果模型。本文采用基于认知科学的心智理论定义,开发并测试了一种新的评估框架。具体而言,我们的方法探究LLMs是否拥有一个连贯的、领域通用的、且一致的心理状态如何导致行为的模型——无论该模型是否与人类心智理论相匹配。研究发现,尽管LLMs在简单的心智理论范式中能够近似人类判断,但在逻辑等价任务中却失败,并且其行为预测与相应心理状态推断之间的一致性较低。因此,这些结果表明LLMs所展现的社会能力并非源自领域通用或一致的心智理论。