Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Therefore, it is of great importance to evaluate their emerging abilities. In this study, we show that LLMs like GPT-3 exhibit behavior that strikingly resembles human-like intuition - and the cognitive errors that come with it. However, LLMs with higher cognitive capabilities, in particular ChatGPT and GPT-4, learned to avoid succumbing to these errors and perform in a hyperrational manner. For our experiments, we probe LLMs with the Cognitive Reflection Test (CRT) as well as semantic illusions that were originally designed to investigate intuitive decision-making in humans. Our study demonstrates that investigating LLMs with methods from psychology has the potential to reveal otherwise unknown emergent traits.
翻译:大型语言模型(LLMs)目前处于将人工智能系统与人类交流及日常生活交织的前沿。因此,评估其涌现能力至关重要。本研究表明,如GPT-3等LLMs展现出与人类直觉惊人相似的行为——以及随之而来的认知错误。然而,具备更高认知能力的LLMs,特别是ChatGPT和GPT-4,已学会避免陷入这些错误,并以超理性方式运作。在我们的实验中,我们使用最初为研究人类直觉决策而设计的认知反射测试(CRT)及语义错觉对LLMs进行探测。本研究表明,运用心理学方法研究LLMs有望揭示其他方式难以发现的涌现特征。