Clinical calculators play a vital role in healthcare by offering accurate evidence-based predictions for various purposes such as prognosis. Nevertheless, their widespread utilization is frequently hindered by usability challenges, poor dissemination, and restricted functionality. Augmenting large language models with extensive collections of clinical calculators presents an opportunity to overcome these obstacles and improve workflow efficiency, but the scalability of the manual curation process poses a significant challenge. In response, we introduce AgentMD, a novel language agent capable of curating and applying clinical calculators across various clinical contexts. Using the published literature, AgentMD has automatically curated a collection of 2,164 diverse clinical calculators with executable functions and structured documentation, collectively named RiskCalcs. Manual evaluations show that RiskCalcs tools achieve an accuracy of over 80% on three quality metrics. At inference time, AgentMD can automatically select and apply the relevant RiskCalcs tools given any patient description. On the newly established RiskQA benchmark, AgentMD significantly outperforms chain-of-thought prompting with GPT-4 (87.7% vs. 40.9% in accuracy). Additionally, we also applied AgentMD to real-world clinical notes for analyzing both population-level and risk-level patient characteristics. In summary, our study illustrates the utility of language agents augmented with clinical calculators for healthcare analytics and patient care.
翻译:临床计算器在医疗保健中发挥着至关重要的作用,通过为预后等不同目的提供基于证据的精确预测。然而,其广泛应用常因可用性挑战、传播不足及功能受限而受阻。为大型语言模型扩充大量临床计算器数据集,为克服这些障碍并提升工作流程效率提供了机遇,但人工筛选过程的可扩展性构成重大挑战。为此,我们提出AgentMD——一种新颖的语言代理,能够跨不同临床场景筛选并应用临床计算器。基于已发表文献,AgentMD自动筛选出包含2,164种多样化临床计算器的集合(统称RiskCalcs),这些计算器均具备可执行函数与结构化文档。人工评估表明,RiskCalcs工具在三个质量指标上的准确率均超过80%。在推理阶段,AgentMD可根据任意患者描述自动选择并应用相关RiskCalcs工具。在新构建的RiskQA基准测试中,AgentMD显著优于基于GPT-4的思维链提示方法(准确率87.7%对40.9%)。此外,我们还将AgentMD应用于真实临床笔记,以分析人群层面与风险层面的患者特征。综上,本研究阐明了结合临床计算器的语言代理在医疗分析与患者护理中的实用价值。