Research on emergent patterns in Large Language Models (LLMs) has gained significant traction in both psychology and artificial intelligence, motivating the need for a comprehensive review that offers a synthesis of this complex landscape. In this article, we systematically review LLMs' capabilities across three important cognitive domains: decision-making biases, reasoning, and creativity. We use empirical studies drawing on established psychological tests and compare LLMs' performance to human benchmarks. On decision-making, our synthesis reveals that while LLMs demonstrate several human-like biases, some biases observed in humans are absent, indicating cognitive patterns that only partially align with human decision-making. On reasoning, advanced LLMs like GPT-4 exhibit deliberative reasoning akin to human System-2 thinking, while smaller models fall short of human-level performance. A distinct dichotomy emerges in creativity: while LLMs excel in language-based creative tasks, such as storytelling, they struggle with divergent thinking tasks that require real-world context. Nonetheless, studies suggest that LLMs hold considerable potential as collaborators, augmenting creativity in human-machine problem-solving settings. Discussing key limitations, we also offer guidance for future research in areas such as memory, attention, and open-source model development.
翻译:大型语言模型(LLMs)中涌现模式的研究在心理学和人工智能领域均获得了广泛关注,这促使我们需要一篇综述性文章来整合这一复杂领域的研究现状。本文系统性地回顾了LLMs在三个重要认知领域的能力:决策偏差、推理和创造力。我们基于成熟的心理学测试开展实证研究,并将LLMs的表现与人类基准进行比较。在决策方面,我们的综合分析表明,虽然LLMs表现出多种类人偏差,但人类中存在的一些偏差在LLMs中并未出现,这表明其认知模式仅部分符合人类决策特征。在推理方面,像GPT-4这样的先进LLMs展现出类似于人类系统2思维的审慎推理能力,而较小模型则未能达到人类水平。在创造力方面呈现出明显的二分现象:LLMs在基于语言的创造性任务(如故事创作)中表现优异,但在需要现实世界背景的发散性思维任务中则存在困难。尽管如此,研究表明LLMs作为协作工具具有巨大潜力,能够在人机协同解决问题的场景中增强创造力。在讨论关键局限性的同时,我们也为未来在记忆、注意力及开源模型开发等领域的研究提供了方向指引。