Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

Wanling Gao,Yunyou Huang,Dandan Cui,Zhuoming Yu,Wenjing Liu,Xiaoshuang Liang,Jiahui Zhao,Jiyue Xie,Hao Li,Li Ma,Ning Ye,Yumiao Kang,Dingfeng Luo,Peng Pan,Wei Huang,Zhongmou Liu,Jizhong Hu,Gangyuan Zhao,Chongrong Jiang,Fan Huang,Tianyi Wei,Suqin Tang,Bingjie Xia,Zhifei Zhang,Jianfeng Zhan

from arxiv, 23 pages

A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.

翻译：人工智能（AI）与医学临床实践之间仍存在显著差距，这主要源于缺乏严谨且经济有效的评估方法。当前最先进及实践中采用的AI模型评估仅限于对医学数据集的实验室研究，或采用无对照或仅以患者为中心的对照的直接临床试验。此外，临床医生在与AI协作中的关键作用——这对于确定AI对临床实践的影响至关重要——常被忽视。我们首次强调，在临床实践中为AI模型建立严谨且经济有效的评估方法具有至关重要的必要性，提出了以患者/临床医生为中心（双中心）的AI随机对照试验（DC-AI RCTs）以及基于虚拟临床医生的计算机模拟试验（VC-MedAI）作为DC-AI RCTs的有效替代方案。利用来自14个医疗中心、125名临床医生的两阶段首次DC-AI RCTs的7500份诊断记录，我们的结果证明了DC-AI RCTs的必要性以及VC-MedAI的有效性。值得注意的是，VC-MedAI的表现与人类临床医生相当，复现了前瞻性DC-AI RCTs的见解和结论。我们设想DC-AI RCTs和VC-MedAI将成为关键进展，为临床实践中的AI模型提供创新且变革性的评估方法，提供一个模拟传统医学的临床前样环境，并以经济高效和快速迭代的方式重塑开发范式。中国临床试验注册号：ChiCTR2400086816。