As machine learning (ML)-based decision support tools proliferate in clinical practice, understanding how clinicians integrate personalized ML predictions alongside randomized controlled trial (RCT) evidence is critical. We designed a web-based clinical decision support system (CDSS) presenting survival and adverse event data from a simulated RCT and ML model across 12 synthetic multiple myeloma scenarios. In a within- subjects study with 32 physicians, we evaluated how clinicians synthesize competing evidence sources to make treatment decisions. When ML and RCT outputs were concordant, physicians reported greater confidence than with RCT data alone. When results were discordant, most physicians shifted toward the ML-supported treatment, often before reviewing any information about model training or validation, suggesting a tendency toward automation bias rather than algorithm avoidance. Despite reporting higher perceived reliability after viewing model quality disclosures, physicians were largely unable to describe the validation procedures they had reviewed. Taken together, these findings reveal that clinicians may over-rely on ML recommendations even when equipped with tools designed to support critical appraisal. We discuss implications for CDSS design, clinician training, and the institutional safeguards needed before ML-based systems are deployed in high-stakes oncology settings.
翻译:随着基于机器学习的决策支持工具在临床实践中的普及,理解临床医生如何整合个性化机器学习预测与随机对照试验证据至关重要。我们设计了一个基于网络的临床决策支持系统,展示来自模拟随机对照试验和机器学习模型的12个多发性骨髓瘤合成病例的生存和不良事件数据。在一项包含32名医生的受试者内研究中,我们评估了临床医生如何综合竞争性证据来源以做出治疗决策。当机器学习与随机对照试验结果一致时,医生报告的置信水平高于仅使用随机对照试验数据。当结果不一致时,大多数医生转向机器学习支持的疗法,通常是在查阅模型训练或验证信息之前,这提示了趋向自动化偏差而非算法回避的倾向。尽管在查看模型质量披露后报告了更高的感知可靠性,但医生大多无法描述他们已查阅的验证程序。综合来看,这些发现揭示了临床医生可能过度依赖机器学习推荐,即使配备了旨在支持批判性评估的工具。我们讨论了这些发现对临床决策支持系统设计、临床医生培训以及在高风险肿瘤学环境中部署基于机器学习的系统前所需制度保障的启示。