A Large Language Model (LLM) is an artificial intelligence system that has been trained on vast amounts of natural language data, enabling it to generate human-like responses to written or spoken language input. GPT-3.5 is an example of an LLM that supports a conversational agent called ChatGPT. In this work, we used a series of novel prompts to determine whether ChatGPT shows heuristics, biases, and other decision effects. We also tested the same prompts on human participants. Across four studies, we found that ChatGPT was influenced by random anchors in making estimates (Anchoring Heuristic, Study 1); it judged the likelihood of two events occurring together to be higher than the likelihood of either event occurring alone, and it was erroneously influenced by salient anecdotal information (Representativeness and Availability Heuristic, Study 2); it found an item to be more efficacious when its features were presented positively rather than negatively - even though both presentations contained identical information (Framing Effect, Study 3); and it valued an owned item more than a newly found item even though the two items were identical (Endowment Effect, Study 4). In each study, human participants showed similar effects. Heuristics and related decision effects in humans are thought to be driven by cognitive and affective processes such as loss aversion and effort reduction. The fact that an LLM - which lacks these processes - also shows such effects invites consideration of the possibility that language may play a role in generating these effects in humans.
翻译:大语言模型(LLM)是一种经过海量自然语言数据训练的人工智能系统,能够对书面或口头语言输入生成类人响应。GPT-3.5是支撑对话代理ChatGPT的LLM实例。本研究通过一系列创新提示词,探究ChatGPT是否表现出启发式、认知偏差及其他决策效应,并对人类参与者进行了相同提示词的测试。四项研究表明:ChatGPT在估值时受随机锚点影响(锚定启发式,研究1);它认为两事件同时发生的概率高于任一事件的单独概率,并错误地受显眼轶事信息影响(代表性启发式与可得性启发式,研究2);当物品特征以正面而非负面方式呈现时,即使两者信息完全一致,它仍认为该物品更有效(框架效应,研究3);对已拥有物品的估值高于新发现的相同物品(禀赋效应,研究4)。各项研究中,人类参与者均表现出类似效应。人类的启发式及相关决策效应被认为由损失厌恶、努力减少等认知与情感过程驱动。缺乏这些过程的LLM却同样展现此类效应,这促使我们思考:语言可能在人类产生这些效应中扮演关键角色。