A growing literature studies how humans incorporate advice from algorithms. This study examines an algorithm with millions of daily users: ChatGPT. We conduct a lab experiment in which 118 student participants answer 2,828 multiple-choice questions across 25 academic subjects. We present participants with answers from a GPT model and allow them to update their initial responses. We find that the advisor's identity ("AI chatbot" versus a human "expert"), presence of written justification, and advice correctness do not significant affect weight on advice. Instead, we show that participants weigh advice more heavily if they (1) are unfamiliar with the topic, (2) used ChatGPT in the past, or (3) received more accurate advice previously. These three effects -- task difficulty, algorithm familiarity, and experience, respectively -- appear to be stronger with an AI chatbot as the advisor. Moreover, we find that participants are able to place greater weight on correct advice only when written justifications are provided. In a parallel analysis, we find that the student participants are miscalibrated and significantly underestimate the accuracy of ChatGPT on 10 of 25 topics. Students under-weigh advice by over 50% and would have scored better if they trusted ChatGPT more.
翻译:日益增长的文献研究了人类如何采纳算法的建议。本研究考察了一个拥有数百万日常用户的算法:ChatGPT。我们进行了一项实验室实验,118名学生参与者回答了涵盖25个学术科目的2,828道多选题。我们向参与者展示GPT模型给出的答案,并允许他们更新初始回应。我们发现,建议者的身份(“AI聊天机器人”与人类“专家”)、书面理由的存在以及建议的正确性对建议权重没有显著影响。相反,我们表明,如果参与者(1)不熟悉该主题,(2)曾使用过ChatGPT,或(3)之前获得过更准确的建议,他们会更重视建议。这三种效应——分别为任务难度、算法熟悉度和经验——在AI聊天机器人作为建议者时似乎更强。此外,我们发现,仅在提供书面理由时,参与者才能更重视正确的建议。在并行分析中,我们发现学生参与者的校准有误,在25个主题中的10个上显著低估了ChatGPT的准确性。学生对建议的权重不足超过50%,如果他们更信任ChatGPT,得分会更高。