Analogical reasoning-the capacity to identify and map structural relationships between different domains-is fundamental to human cognition and learning. Recent studies have shown that large language models (LLMs) can sometimes match humans in analogical reasoning tasks, opening the possibility that analogical reasoning might emerge from domain general processes. However, it is still debated whether these emergent capacities are largely superficial and limited to simple relations seen during training or whether they rather encompass the flexible representational and mapping capabilities which are the focus of leading cognitive models of analogy. In this study, we introduce novel analogical reasoning tasks that require participants to map between semantically contentful words and sequences of letters and other abstract characters. This task necessitates the ability to flexibly re-represent rich semantic information-an ability which is known to be central to human analogy but which is thus far not well-captured by existing cognitive theories and models. We assess the performance of both human participants and LLMs on tasks focusing on reasoning from semantic structure and semantic content, introducing variations that test the robustness of their analogical inferences. Advanced LLMs match human performance across several conditions, though humans and LLMs respond differently to certain task variations and semantic distractors. Our results thus provide new evidence that LLMs might offer a how-possibly explanation of human analogical reasoning in contexts that are not yet well modeled by existing theories, but that even today's best models are unlikely to yield how-actually explanations.
翻译:类比推理——识别并映射不同领域间结构关系的能力——是人类认知与学习的核心能力。近期研究表明,大语言模型(LLMs)在某些类比推理任务中能达到与人类相当的水平,这暗示类比推理可能源自领域通用的处理过程。然而,这些涌现的能力究竟是表面性的、仅限于训练中见过的简单关系,还是真正包含了灵活的表征与映射能力(这正是主流类比认知模型的核心焦点),目前仍存在争议。本研究引入新型类比推理任务,要求参与者在具有语义内容的词汇与字母序列及其他抽象字符之间建立映射。该任务需要灵活重组丰富语义信息的能力——这种能力已知是人类类比推理的关键,但现有认知理论与模型尚未能充分捕捉。我们评估了人类参与者与大语言模型在语义结构和语义内容推理任务上的表现,并通过任务变体测试其类比推理的稳健性。先进的大语言模型在多种条件下达到人类水平,但人类与模型对某些任务变体和语义干扰项的反应存在差异。因此,我们的研究结果提供了新证据:在现有理论尚未充分建模的语境中,大语言模型可能为人类类比推理提供"可能性解释";但即使当前最优模型,也不太可能给出"实际性解释"。