Large Language Models (LLMs) increasingly show reasoning rationales alongside their answers, turning "reasoning" into a user-interface element. While step-by-step rationales are typically associated with model performance, how they influence users' trust and decision-making in factual verification tasks remains unclear. We ran an online study (N=68) manipulating three properties of LLM reasoning rationales: presentation format (instant vs. delayed vs. on-demand), correctness (correct vs. incorrect), and certainty framing (none vs. certain vs. uncertain). We found that correct rationales and certainty cues increased trust, decision confidence, and AI advice adoption, whereas uncertainty cues reduced them. Presentation format did not have a significant effect, suggesting users were less sensitive to how reasoning was revealed than to its reliability. Participants indicated they use rationales to primarily audit outputs and calibrate trust, where they expected rationales in stepwise, adaptive forms with certainty indicators. Our work shows that user-facing rationales, if poorly designed, can both support decision-making yet miscalibrate trust.
翻译:大型语言模型(LLM)越来越多地在提供答案的同时展示推理依据,使“推理”成为一种用户界面元素。尽管分步推理依据通常与模型性能相关,但它们在事实核查任务中如何影响用户的信任和决策仍不明确。我们开展了一项在线研究(N=68),通过操控LLM推理依据的三个属性进行实验:呈现形式(即时呈现 vs. 延迟呈现 vs. 按需呈现)、正确性(正确 vs. 错误)以及确定性表达(无提示 vs. 确定性提示 vs. 不确定性提示)。研究发现,正确的推理依据和确定性提示能提升信任度、决策信心及对AI建议的采纳率,而不确定性提示则会降低这些指标。呈现形式未产生显著影响,表明用户对推理过程的呈现方式不如对其可靠性敏感。参与者表示,他们主要利用推理依据来审核输出结果并校准信任度,期望推理依据能以分步、自适应且带有确定性指示的形式呈现。我们的研究表明,面向用户的推理依据若设计不当,虽能辅助决策,却可能导致信任度校准失准。