There is increasing interest in employing large language models (LLMs) as cognitive models. For such purposes, it is central to understand which properties of human cognition are well-modeled by LLMs, and which are not. In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems. Surveying the learning science literature, we posit that the problem-solving process can be split into three distinct steps: text comprehension, solution planning and solution execution. We construct tests for each one in order to understand whether current LLMs display the same cognitive biases as children in these steps. We generate a novel set of word problems for each of these tests, using a neuro-symbolic approach that enables fine-grained control over the problem features. We find evidence that LLMs, with and without instruction-tuning, exhibit human-like biases in both the text-comprehension and the solution-planning steps of the solving process, but not in the final step, in which the arithmetic expressions are executed to obtain the answer.
翻译:将大型语言模型(LLMs)用作认知模型的研究兴趣日益增长。为此,核心在于理解人类认知的哪些特性能够被LLMs良好建模,哪些则不能。在本工作中,我们研究了LLMs在解决算术应用题时表现出的偏差,并与已知的儿童认知偏差进行比较。通过梳理学习科学文献,我们提出问题解决过程可分为三个不同步骤:文本理解、解题规划和解题执行。我们为每一步构建了测试,以探究当前LLMs是否在这些步骤中表现出与儿童相同的认知偏差。我们采用神经符号方法生成了一套针对这些测试的全新应用题集,该方法能够对问题特征进行细粒度控制。研究发现,无论是否经过指令微调,LLMs在解题过程的文本理解和解题规划步骤中均表现出类人的认知偏差,但在最终步骤(即执行算术表达式以获得答案)中则未表现出此类偏差。