“我信任AI吗？”迈向可信赖的AI辅助诊断：理解LLM支持推理中的用户感知 ("Do I Trust the AI?" Towards Trustworthy AI-Assisted Diagnosis: Understanding User Perception in LLM-Supported Reasoning)

Large language models (LLMs) have shown considerable potential in supporting medical diagnosis. However, their effective integration into clinical workflows is hindered by physicians' difficulties in perceiving and trusting LLM capabilities, which often results in miscalibrated trust. Existing model evaluations primarily emphasize standardized benchmarks and predefined tasks, offering limited insights into clinical reasoning practices. Moreover, research on human-AI collaboration has rarely examined physicians' perceptions of LLMs' clinical reasoning capability. In this work, we investigate how physicians perceive LLMs' capabilities in the clinical reasoning process. We designed clinical cases, collected the corresponding analyses, and obtained evaluations from physicians (N=37) to quantitatively represent their perceived LLM diagnostic capabilities. By comparing the perceived evaluations with benchmark performance, our study highlights the aspects of clinical reasoning that physicians value and underscores the limitations of benchmark-based evaluation. We further discuss the implications of opportunities for enhancing trustworthy collaboration between physicians and LLMs in LLM-supported clinical reasoning.

翻译：大型语言模型（LLMs）在支持医学诊断方面展现出巨大潜力。然而，由于医生难以感知和信任LLM的能力，导致信任度校准不当，阻碍了其有效融入临床工作流程。现有的模型评估主要侧重于标准化基准和预定义任务，对临床推理实践的洞察有限。此外，关于人机协作的研究很少考察医生对LLM临床推理能力的感知。本研究探讨了医生在临床推理过程中如何感知LLM的能力。我们设计了临床案例，收集了相应的分析，并获取了医生（N=37）的评估，以量化表示他们感知到的LLM诊断能力。通过将感知评估与基准性能进行比较，我们的研究揭示了医生所重视的临床推理方面，并强调了基于基准评估的局限性。我们进一步讨论了在LLM支持的临床推理中，加强医生与LLM之间可信赖协作的机遇与启示。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【伯克利博士论文】从推理服务到模型训练：面向大规模 LLM 智能体的高效系统构建

专知会员服务

19+阅读 · 1月2日

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

专知会员服务

24+阅读 · 2025年12月18日

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

专知会员服务

23+阅读 · 2025年10月29日

LLM/智能体作为数据分析师：综述

专知会员服务

36+阅读 · 2025年9月30日