增强临床决策的交互式可解释AI助手：一项在肾脏病学和产科临床医生中进行的真实世界用户研究 (Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics)

Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics

翻译：增强临床决策的交互式可解释AI助手：一项在肾脏病学和产科临床医生中进行的真实世界用户研究

Yinghao Zhu,Dehao Sui,Zixiang Wang,Xuning Hu,Lei Gu,Yifan Qi,Tianchen Wu,Ling Wang,Yuan Wei,Wen Tang,Zhihan Cui,Yasha Wang,Lequan Yu,Ewen M Harrison,Junyi Gao,Liantao Ma

from arxiv, Accepted by ACM CHI 2026

Clinician skepticism toward opaque AI hinders adoption in high-stakes healthcare. We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making. By analyzing longitudinal electronic health records, AICare grounds dynamic risk predictions in scrutable visualizations and LLM-driven diagnostic recommendations. Through a within-subjects counterbalanced study with 16 clinicians across nephrology and obstetrics, we comprehensively evaluated AICare using objective measures (task completion time and error rate), subjective assessments (NASA-TLX, SUS, and confidence ratings), and semi-structured interviews. Our findings indicate AICare's reduced cognitive workload. Beyond performance metrics, qualitative analysis reveals that trust is actively constructed through verification, with interaction strategies diverging by expertise: junior clinicians used the system as cognitive scaffolding to structure their analysis, while experts engaged in adversarial verification to challenge the AI's logic. This work offers design implications for creating AI systems that function as transparent partners, accommodating diverse reasoning styles to augment rather than replace clinical judgment.

翻译：临床医生对不透明人工智能的怀疑阻碍了其在高风险医疗领域的应用。我们提出了AICare，一个用于协作临床决策的交互式可解释AI助手。通过分析纵向电子健康记录，AICare将动态风险预测建立在可审查的可视化以及由大型语言模型驱动的诊断建议之上。我们通过对肾脏病学和产科共16名临床医生进行的一项受试者内平衡研究中，综合评估了AICare，评估指标包括客观测量（任务完成时间和错误率）、主观评估（NASA-TLX任务负荷量表、系统可用性量表和信心评分）以及半结构化访谈。我们的研究结果表明AICare降低了认知负荷。除了性能指标外，定性分析揭示信任是通过验证过程主动构建的，且交互策略因专业水平而异：初级临床医生将系统用作构建其分析框架的认知支架，而专家则进行对抗性验证以挑战AI的逻辑。这项工作为创建作为透明伙伴的人工智能系统提供了设计启示，此类系统应适应不同的推理风格，以增强而非取代临床判断。