Large Language Models to Identify Social Determinants of Health in Electronic Health Records

Marco Guevara,Shan Chen,Spencer Thomas,Tafadzwa L. Chaunzwa,Idalid Franco,Benjamin Kann,Shalini Moningi,Jack Qian,Madeleine Goldstein,Susan Harper,Hugo JWL Aerts,Guergana K. Savova,Raymond H. Mak,Danielle S. Bitterman

from arxiv, Peer-reviewed version published at NPJ Digital Medicine: https://www.nature.com/articles/s41746-023-00970-0

Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documented, yet extremely valuable, clinical data. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated. The study also experimented with synthetic data generation and assessed for algorithmic bias. Our best-performing models were fine-tuned Flan-T5 XL (macro-F1 0.71) for any SDoH, and Flan-T5 XXL (macro-F1 0.70). The benefit of augmenting fine-tuning with synthetic data varied across model architecture and size, with smaller Flan-T5 models (base and large) showing the greatest improvements in performance (delta F1 +0.12 to +0.23). Model performance was similar on the in-hospital system dataset but worse on the MIMIC-III dataset. Our best-performing fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models for both tasks. These fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p<0.05). At the patient-level, our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. Our method can effectively extracted SDoH information from clinic notes, performing better compare to GPT zero- and few-shot settings. These models could enhance real-world evidence on SDoH and aid in identifying patients needing social support.

翻译：健康的社会决定因素（SDoH）对患者预后具有重要影响，但这些信息在电子健康档案（EHR）中的收集并不完整。本研究探讨了大型语言模型从EHR自由文本（SDoH最常记录的载体）中提取SDoH的能力，并探索了合成临床文本在改善这些记录稀少但极具价值的临床数据提取中的作用。研究人员对800份患者病历进行了SDoH类别标注，并评估了多种基于Transformer的模型。研究还开展了合成数据生成实验，并对算法偏差进行了评估。性能最优的模型是针对任意SDoH进行微调的Flan-T5 XL（宏F1值为0.71）和Flan-T5 XXL（宏F1值为0.70）。使用合成数据增强微调的效果因模型架构和规模而异，其中较小的Flan-T5模型（base和large版本）在性能上提升最为显著（F1差值增加0.12至0.23）。模型在医院内部系统数据集上的表现相似，但在MIMIC-III数据集上表现较差。在两项任务中，我们最优的微调模型均优于ChatGPT系列模型的零样本和少样本性能。当在文本中添加种族/族裔和性别描述词时，这些微调模型比ChatGPT更不容易改变预测结果，表明其算法偏差更小（p<0.05）。在患者层面，我们的模型识别出93.8%存在不良SDoH的患者，而ICD-10编码仅捕获了2.0%。我们的方法能够有效从临床病历中提取SDoH信息，且性能优于GPT的零样本和少样本设置。这些模型可增强关于SDoH的真实世界证据，并有助于识别需要社会支持的患者。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日