Large Language Models (LLMs) such as ChatGPT have received enormous attention over the past year and are now used by hundreds of millions of people every day. The rapid adoption of this technology naturally raises questions about the possible biases such models might exhibit. In this work, we tested one of these models (GPT-3) on a range of cognitive effects, which are systematic patterns that are usually found in human cognitive tasks. We found that LLMs are indeed prone to several human cognitive effects. Specifically, we show that the priming, distance, SNARC, and size congruity effects were presented with GPT-3, while the anchoring effect is absent. We describe our methodology, and specifically the way we converted real-world experiments to text-based experiments. Finally, we speculate on the possible reasons why GPT-3 exhibits these effects and discuss whether they are imitated or reinvented.
翻译:像ChatGPT这样的大型语言模型在过去一年中受到了极大的关注,如今每天有数亿人使用。这项技术的迅速普及自然引发了关于此类模型可能存在的偏见问题。在本研究中,我们对其中一种模型(GPT-3)在一系列认知效应上进行了测试,这些效应是通常在人类认知任务中发现的系统性模式。我们发现大型语言模型确实容易出现若干人类认知效应。具体而言,我们展示了GPT-3存在启动效应、距离效应、SNARC效应和大小一致性效应,而锚定效应则不存在。我们描述了我们的方法论,特别是我们将真实世界实验转化为基于文本实验的方式。最后,我们对GPT-3表现出这些效应的可能原因进行了推测,并讨论了这些效应是模仿还是重新创造出来的。