Large language models (LLMs) have the potential to aid and improve human decision-making in classification tasks, not only by providing fairly accurate predictions, but also in their ability to generate cogent narrative explanations of those predictions. Prior work has demonstrated that people generally find AI narrative explanations to be understandable, trustworthy, and convincing for changing beliefs and opinions; however, less is known about the impact of narrative explanations on objective human decision-making performance. Here we conduct a large-scale human behavioral experiment to evaluate decision-making performance with LLM-generated narrative explanations of varying persuasiveness. We found the degree of persuasiveness, or lack thereof, for LLM-based explanations did not meaningfully impact decision accuracy over a simple AI prediction alone, in agreement with typical results with explainable AI based on feature importance. We found evidence that narratives increased reliance on AI, but both when the AI prediction was correct and incorrect. Exploratory analyses also indicated that the more persuasive narratives may have had a detrimental effect on decision response times and the ability to discriminate between a correct and incorrect AI prediction. Overall, this work indicates that including narrative explanations with AI predictions may involve tradeoffs for decision-making performance, and more work is needed to determine how and when narrative explanations impact human decision-making.
翻译:大语言模型(LLMs)在分类任务中不仅能够提供较为准确的预测,还能生成连贯的叙事性解释,因此具备辅助并改善人类决策的潜力。已有研究表明,人们普遍认为AI生成的叙事性解释易于理解、值得信赖,且对改变信念与观点具有说服力;然而,关于叙事性解释对客观人类决策表现的影响,目前知之甚少。本研究通过大规模人类行为实验,评估了在LLM生成的不同说服力叙事性解释下的人类决策表现。我们发现,与仅提供简单AI预测相比,LLM解释的说服力程度(或缺乏说服力)并未显著影响决策准确性,这一结果与基于特征重要性的可解释性AI的典型结论一致。证据表明,叙事性解释增加了人类对AI的依赖,但这种依赖在AI预测正确和错误时均会出现。探索性分析还指出,说服力更强的叙事性解释可能对决策响应时间以及人类区分AI预测正确与错误的能力产生不利影响。总体而言,本研究表明,在AI预测中融入叙事性解释可能对决策表现产生权衡效应,需进一步研究以确定叙事性解释如何及何时影响人类决策。