The development of highly fluent large language models (LLMs) has prompted increased interest in assessing their reasoning and problem-solving capabilities. We investigate whether several LLMs can solve a classic type of deductive reasoning problem from the cognitive science literature. The tested LLMs have limited abilities to solve these problems in their conventional form. We performed follow up experiments to investigate if changes to the presentation format and content improve model performance. We do find performance differences between conditions; however, they do not improve overall performance. Moreover, we find that performance interacts with presentation format and content in unexpected ways that differ from human performance. Overall, our results suggest that LLMs have unique reasoning biases that are only partially predicted from human reasoning performance.
翻译:高度流畅的大型语言模型(LLMs)的发展激发了对其推理和问题解决能力评估的更大兴趣。我们研究了多个LLM是否能够解决认知科学文献中一类经典的演绎推理问题。所测试的LLM在其常规形式下解决这些问题的能力有限。我们进行了后续实验,以探究展示格式和内容的变化是否能够提升模型性能。我们确实发现了不同实验条件之间的性能差异;然而,这些差异并未提高整体性能。此外,我们发现性能与展示格式和内容之间的交互方式出乎意料,且与人类表现有所不同。总体而言,我们的结果表明,LLM具有独特的推理偏差,这些偏差仅能部分通过人类推理表现进行预测。