Large Language Models (LLMs) and Large Reasoning Models (LRMs) are increasingly used for critical tasks, yet they provide no guarantees about the correctness of their solutions. Users must decide whether to trust the model's answer, aided by reasoning traces, their summaries, or post-hoc generated explanations. These reasoning traces, despite evidence that they are neither faithful representations of the model's computations nor necessarily semantically meaningful, are often interpreted as provenance explanations. It is unclear whether explanations or reasoning traces help users identify when the AI is incorrect, or whether they simply persuade users to trust the AI regardless. In this paper, we take a user-centered approach and develop an evaluation protocol to study how different explanation types affect users' ability to judge the correctness of AI-generated answers and engender false trust in the users. We conduct a between-subject user study, simulating a setting where users do not have the means to verify the solution and analyze the false trust engendered by commonly used LLM explanations - reasoning traces, their summaries and post-hoc explanations. We also test a contrastive dual explanation setting where we present arguments for and against the AI's answer. We find that reasoning traces and post-hoc explanations are persuasive but not informative: they increase user acceptance of LLM predictions regardless of their correctness. In contrast, dual explanation is the only condition that genuinely improves users' ability to distinguish correct from incorrect AI outputs.
翻译:大语言模型(LLMs)和大型推理模型(LRMs)日益被用于关键任务,但它们对解决方案的正确性不作任何保证。用户必须借助推理轨迹、其摘要或事后生成的解释来决定是否信任模型的答案。这些推理轨迹尽管被证明既不是模型计算过程的忠实表征,也不必然具有语义意义,却常被当作溯因解释。目前尚不清楚解释或推理轨迹是否有助于用户识别AI的错误答案,抑或只是单纯地说服用户无论对错都信任AI。本文采用以用户为中心的方法,开发了一套评估协议,研究不同解释类型如何影响用户判断AI生成答案正确性的能力,以及如何引发对用户的虚假信任。我们进行了一项组间用户研究,模拟用户无法验证解决方案的场景,并分析了常用大语言模型解释(推理轨迹、其摘要和事后解释)所引发的虚假信任。我们还测试了一种对比性双重解释设置,即同时呈现支持与反对AI答案的论点。研究发现,推理轨迹和事后解释具有说服力但缺乏信息量:它们会增加用户对LLM预测的接受度,无论预测正确与否。相比之下,双重解释是唯一能够真正提高用户区分AI输出正确与否的能力的条件。