Generative Large Language Models are autonomous practitioners of evidence-based medicine

Akhil Vaid,Joshua Lampert,Juhee Lee,Ashwin Sawant,Donald Apakama,Ankit Sakhuja,Ali Soroush,Denise Lee,Isotta Landi,Nicole Bussola,Ismail Nabeel,Robbie Freeman,Patricia Kovatch,Brendan Carr,Benjamin Glicksberg,Edgar Argulian,Stamatios Lerakis,Monica Kraft,Alexander Charney,Girish Nadkarni

from arxiv, Word count: 4548 words, Figures: 4, Tables: 4

Background: Evidence-based medicine (EBM) is fundamental to modern clinical practice, requiring clinicians to continually update their knowledge and apply the best clinical evidence in patient care. The practice of EBM faces challenges due to rapid advancements in medical research, leading to information overload for clinicians. The integration of artificial intelligence (AI), specifically Generative Large Language Models (LLMs), offers a promising solution towards managing this complexity. Methods: This study involved the curation of real-world clinical cases across various specialties, converting them into .json files for analysis. LLMs, including proprietary models like ChatGPT 3.5 and 4, Gemini Pro, and open-source models like LLaMA v2 and Mixtral-8x7B, were employed. These models were equipped with tools to retrieve information from case files and make clinical decisions similar to how clinicians must operate in the real world. Model performance was evaluated based on correctness of final answer, judicious use of tools, conformity to guidelines, and resistance to hallucinations. Results: GPT-4 was most capable of autonomous operation in a clinical setting, being generally more effective in ordering relevant investigations and conforming to clinical guidelines. Limitations were observed in terms of model ability to handle complex guidelines and diagnostic nuances. Retrieval Augmented Generation made recommendations more tailored to patients and healthcare systems. Conclusions: LLMs can be made to function as autonomous practitioners of evidence-based medicine. Their ability to utilize tooling can be harnessed to interact with the infrastructure of a real-world healthcare system and perform the tasks of patient management in a guideline directed manner. Prompt engineering may help to further enhance this potential and transform healthcare for the clinician and the patient.

翻译：背景：循证医学是现代临床实践的基础，要求临床医生不断更新知识，并在患者诊疗中应用最佳临床证据。由于医学研究快速进展导致临床医生面临信息过载，循证医学的实践面临挑战。人工智能（特别是生成式大语言模型）的整合为解决这一复杂性提供了有前景的方案。方法：本研究整理跨越不同专科的真实世界临床病例，将其转化为.json文件进行分析。采用包括ChatGPT 3.5和4、Gemini Pro等专有模型，以及LLaMA v2和Mixtral-8x7B等开源模型。这些模型配备从病例文件中检索信息并作出临床决策的工具，模拟临床医生在真实场景中的操作模式。模型性能基于最终答案正确性、工具使用的合理性、指南合规性以及对幻觉的抗性进行评估。结果：GPT-4在临床自主操作方面表现最优，在安排相关检查及遵循临床指南方面通常更为有效。模型在处理复杂指南和诊断细微差异方面仍存在局限性。检索增强生成使推荐内容更符合患者个体特征及医疗体系特点。结论：大语言模型可被训练为循证医学的自主实践者。其工具调用能力可用于与真实医疗体系基础设施交互，以指南导向方式完成患者管理任务。提示工程有望进一步释放其潜能，为临床医生和患者带来医疗变革。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Nat. Biotechnol. | 机器学习为生物库驱动的药物发现提供动力

专知会员服务

11+阅读 · 2022年9月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日