A growing literature studies how humans incorporate advice from algorithms. This study examines an algorithm with millions of daily users: ChatGPT. In a preregistered study, 118 student participants answer 2,828 multiple-choice questions across 25 academic subjects. Participants receive advice from a GPT model and can update their initial responses. The advisor's identity ("AI chatbot" versus a human "expert"), presence of a written justification, and advice correctness do not significantly affect weight on advice. Instead, participants weigh advice more heavily if they (1) are unfamiliar with the topic, (2) used ChatGPT in the past, or (3) received more accurate advice previously. The last two effects -- algorithm familiarity and experience -- are stronger with an AI chatbot as the advisor. Participants that receive written justifications are able to discern correct advice and update accordingly. Student participants are miscalibrated in their judgements of ChatGPT advice accuracy; one reason is that they significantly misjudge the accuracy of ChatGPT on 11/25 topics. Participants under-weigh advice by over 50% and can score better by trusting ChatGPT more.
翻译:日益增长的文献研究人类如何采纳算法的建议。本研究考察了一个拥有数百万日活跃用户的算法:ChatGPT。在一项预注册研究中,118名学生参与者回答了横跨25个学术科目的2,828道选择题。参与者获得GPT模型提供的建议,并可更新其初始回答。建议提供者的身份(“AI聊天机器人”与人类“专家”)、书面论证的存在与否以及建议的正确性对建议采纳权重无显著影响。相反,参与者在以下情况下会更重视建议:(1)对主题不熟悉,(2)过去曾使用过ChatGPT,或(3)此前收到过更准确的建议。最后两个效应——算法熟悉度与经验——在建议提供者为AI聊天机器人时更为显著。收到书面论证的参与者能够辨别正确建议并相应更新回答。学生参与者对ChatGPT建议准确性的判断存在偏差;原因之一是他们在11/25个主题上显著误判了ChatGPT的准确性。参与者对建议的权重不足超过50%,若更信任ChatGPT则可获得更优得分。