Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments. It allows individuals to draw conclusions that logically follow, given the truth value of the information provided. Recent progress in the domain of large language models (LLMs) has showcased their capability in executing deductive reasoning tasks. Nonetheless, a significant portion of research primarily assesses the accuracy of LLMs in solving such tasks, often overlooking a deeper analysis of their reasoning behavior. In this study, we draw upon principles from cognitive psychology to examine inferential strategies employed by LLMs, through a detailed evaluation of their responses to propositional logic problems. Our findings indicate that LLMs display reasoning patterns akin to those observed in humans, including strategies like $\textit{supposition following}$ or $\textit{chain construction}$. Moreover, our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning, with more advanced models tending to adopt strategies more frequently than less sophisticated ones. Importantly, we assert that a model's accuracy, that is the correctness of its final conclusion, does not necessarily reflect the validity of its reasoning process. This distinction underscores the necessity for more nuanced evaluation procedures in the field.
翻译:演绎推理在构建严谨且连贯的论证过程中起着关键作用。它使个体能够在给定信息真值的前提下,得出逻辑上必然的结论。近期大型语言模型(LLMs)领域的研究进展展示了其执行演绎推理任务的能力。然而,现有研究大多侧重于评估LLMs解决此类任务的准确性,往往忽视对其推理行为进行更深入的分析。本研究借鉴认知心理学原理,通过对LLMs在命题逻辑问题上作答的细致评估,探究其采用的推断策略。研究发现,LLMs展现出与人类相似的推理模式,包括诸如“假设跟随”或“链式构建”等策略。此外,研究表明模型的架构与规模显著影响其偏好的推理方式,较先进的模型倾向于比简单模型更频繁地采用特定策略。尤为重要的是,我们指出模型的准确性——即其最终结论的正确性——未必反映其推理过程的有效性。这一区分强调了该领域需要更精细评估方法的必要性。