Pulmonary embolism (PE) is a leading cause of cardiovascular mortality, yet our understanding of optimal management remains limited due to heterogeneous and inaccessible radiology documentation. The PERT Consortium registry standardizes PE management data but depends on resource-intensive manual abstraction. Large language models (LLMs) offer a scalable alternative for automating concept extraction from computed tomography PE (CTPE) reports. This study evaluated the accuracy of LLMs in extracting PE-related concepts compared to a human-curated criterion standard. We retrospectively analyzed MIMIC-IV and Duke Health CTPE reports using multiple LLaMA models. Larger models (70B) outperformed smaller ones (8B), achieving kappa values of 0.98 (PE detection), 0.65-0.75 (PE location), 0.48-0.51 (right heart strain), and 0.65-0.70 (image artifacts). Moderate temperature tuning (0.2-0.5) improved accuracy, while excessive in-context examples reduced performance. A dual-model review framework achieved >80-90% precision. LLMs demonstrate strong potential for automating PE registry abstraction, minimizing manual workload while preserving accuracy.
翻译:肺栓塞(PE)是心血管死亡的主要原因之一,但由于放射学记录存在异质性且难以获取,我们对最佳治疗策略的理解仍有限。PERT联盟注册库标准化了肺栓塞管理数据,但依赖于资源密集型的人工提取。大型语言模型(LLMs)为从计算机断层扫描肺栓塞(CTPE)报告中自动提取概念提供了可扩展的替代方案。本研究评估了LLMs提取肺栓塞相关概念的准确性,并与人工标注的黄金标准进行比较。我们使用多个LLaMA模型回顾性分析了MIMIC-IV和杜克健康系统的CTPE报告。较大模型(70B)比较小模型(8B)表现更优,其kappa值分别为:肺栓塞检测0.98、肺栓塞位置0.65-0.75、右心劳损0.48-0.51、图像伪影0.65-0.70。适度的温度参数调整(0.2-0.5)提升了准确性,而过多的上下文示例反而降低了性能。双模型复核框架实现了超过80-90%的精确率。研究表明,大型语言模型在自动化肺栓塞注册库信息提取方面展现出巨大潜力,能够在保持准确性的同时显著减少人工工作量。