RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction

The application of Large Language Models (LLMs) to various clinical applications has attracted growing research attention. However, real-world clinical decision-making differs significantly from the standardized, exam-style scenarios commonly used in current efforts. In this paper, we present the RiskAgent system to perform a broad range of medical risk predictions, covering over 387 risk scenarios across diverse complex diseases, e.g., cardiovascular disease and cancer. RiskAgent is designed to collaborate with hundreds of clinical decision tools, i.e., risk calculators and scoring systems that are supported by evidence-based medicine. To evaluate our method, we have built the first benchmark MedRisk specialized for risk prediction, including 12,352 questions spanning 154 diseases, 86 symptoms, 50 specialties, and 24 organ systems. The results show that our RiskAgent, with 8 billion model parameters, achieves 76.33% accuracy, outperforming the most recent commercial LLMs, o1, o3-mini, and GPT-4.5, and doubling the 38.39% accuracy of GPT-4o. On rare diseases, e.g., Idiopathic Pulmonary Fibrosis (IPF), RiskAgent outperforms o1 and GPT-4.5 by 27.27% and 45.46% accuracy, respectively. Finally, we further conduct a generalization evaluation on an external evidence-based diagnosis benchmark and show that our RiskAgent achieves the best results. These encouraging results demonstrate the great potential of our solution for diverse diagnosis domains. To improve the adaptability of our model in different scenarios, we have built and open-sourced a family of models ranging from 1 billion to 70 billion parameters. Our code, data, and models are all available at https://github.com/AI-in-Health/RiskAgent.

翻译：大型语言模型（LLM）在各类临床应用中的研究日益受到关注。然而，现实世界的临床决策与当前研究中普遍采用的标准化、考试式场景存在显著差异。本文提出RiskAgent系统，用于执行广泛的医疗风险预测，涵盖包括心血管疾病与癌症在内的多种复杂疾病，涉及超过387种风险场景。RiskAgent设计用于与数百种基于循证医学支持的临床决策工具（即风险计算器和评分系统）协同工作。为评估本方法，我们构建了首个专注于风险预测的基准测试集MedRisk，包含12,352道问题，涵盖154种疾病、86种症状、50个专科及24个器官系统。实验结果表明，我们拥有80亿参数的RiskAgent模型取得了76.33%的准确率，优于最新的商用LLM（包括o1、o3-mini和GPT-4.5），并将GPT-4o的38.39%准确率提升了一倍。在罕见病（如特发性肺纤维化）预测中，RiskAgent分别以27.27%和45.46%的准确率优势超越o1与GPT-4.5。最后，我们在外部循证诊断基准上进行了泛化性评估，结果显示RiskAgent取得了最优表现。这些令人鼓舞的结果证明了我们的解决方案在多样化诊断领域的巨大潜力。为提升模型在不同场景下的适应能力，我们构建并开源了参数量从10亿到700亿的系列模型。相关代码、数据及模型均已发布于https://github.com/AI-in-Health/RiskAgent。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日