Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters

MoDELS · Automator · 语言模型化 · 模型评估 · Performer ·

2025 年 3 月 26 日

翻译：评估大型语言模型在肺栓塞注册库中自动临床信息提取的性能：跨模型规模、版本与参数的表现

Mahmoud Alwakeel,Emory Buck,Jonathan G. Martin,Imran Aslam,Sudarshan Rajagopal,Jian Pei,Mihai V. Podgoreanu,Christopher J. Lindsell,An-Kwok Ian Wong

Pulmonary embolism (PE) is a leading cause of cardiovascular mortality, yet our understanding of optimal management remains limited due to heterogeneous and inaccessible radiology documentation. The PERT Consortium registry standardizes PE management data but depends on resource-intensive manual abstraction. Large language models (LLMs) offer a scalable alternative for automating concept extraction from computed tomography PE (CTPE) reports. This study evaluated the accuracy of LLMs in extracting PE-related concepts compared to a human-curated criterion standard. We retrospectively analyzed MIMIC-IV and Duke Health CTPE reports using multiple LLaMA models. Larger models (70B) outperformed smaller ones (8B), achieving kappa values of 0.98 (PE detection), 0.65-0.75 (PE location), 0.48-0.51 (right heart strain), and 0.65-0.70 (image artifacts). Moderate temperature tuning (0.2-0.5) improved accuracy, while excessive in-context examples reduced performance. A dual-model review framework achieved >80-90% precision. LLMs demonstrate strong potential for automating PE registry abstraction, minimizing manual workload while preserving accuracy.

翻译：肺栓塞（PE）是心血管死亡的主要原因之一，但由于放射学记录存在异质性且难以获取，我们对最佳治疗策略的理解仍有限。PERT联盟注册库标准化了肺栓塞管理数据，但依赖于资源密集型的人工提取。大型语言模型（LLMs）为从计算机断层扫描肺栓塞（CTPE）报告中自动提取概念提供了可扩展的替代方案。本研究评估了LLMs提取肺栓塞相关概念的准确性，并与人工标注的黄金标准进行比较。我们使用多个LLaMA模型回顾性分析了MIMIC-IV和杜克健康系统的CTPE报告。较大模型（70B）比较小模型（8B）表现更优，其kappa值分别为：肺栓塞检测0.98、肺栓塞位置0.65-0.75、右心劳损0.48-0.51、图像伪影0.65-0.70。适度的温度参数调整（0.2-0.5）提升了准确性，而过多的上下文示例反而降低了性能。双模型复核框架实现了超过80-90%的精确率。研究表明，大型语言模型在自动化肺栓塞注册库信息提取方面展现出巨大潜力，能够在保持准确性的同时显著减少人工工作量。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日