Objective: Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes. Materials and Methods: We first defined eviction status (eviction presence and eviction period) and then annotated eviction status in 5000 EHR notes from the Veterans Health Administration (VHA). We developed a novel model, KIRESH, that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and BioClinicalBERT. Moreover, we designed a novel prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt substantially outperformed strong baseline models including fine-tuning the BioClinicalBERT model to achieve 0.74672 MCC, 0.71153 Macro-F1, and 0.83396 Micro-F1 in predicting eviction period and 0.66827 MCC, 0.62734 Macro-F1, and 0.7863 Micro-F1 in predicting eviction presence. We also conducted additional experiments on a benchmark social determinants of health (SBDH) dataset to demonstrate the generalizability of our methods. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. We plan to deploy KIRESH-Prompt to the VHA EHRs as an eviction surveillance system to help address the US Veterans' housing insecurity.
翻译:目的:驱逐是重要的社会和行为健康决定因素。驱逐与一系列负面事件相关,可能导致失业、住房不稳定/无家可归、长期贫困以及心理健康问题。在本研究中,我们开发了一个自然语言处理系统,用于从电子健康记录(EHR)笔记中自动检测驱逐状态。材料与方法:我们首先定义了驱逐状态(驱逐存在性和驱逐时期),随后在来自退伍军人健康管理局(VHA)的5000份EHR笔记中对驱逐状态进行了标注。我们开发了一种名为KIRESH的新型模型,该模型显著优于其他现有最优模型,例如基于微调的预训练语言模型(如BioBERT和BioClinicalBERT)。此外,我们设计了一种新颖的提示,利用驱逐存在性和时期预测两个子任务之间的内在联系,进一步提升了模型性能。最后,我们在KIRESH-Prompt方法上应用了基于温度缩放的校准,以解决因数据集不平衡导致的过度自信问题。结果:KIRESH-Prompt显著优于强基线模型(包括微调BioClinicalBERT模型),在驱逐时期预测中达到0.74672的马修斯相关系数(MCC)、0.71153的宏平均F1分数和0.83396的微平均F1分数;在驱逐存在性预测中达到0.66827的MCC、0.62734的宏平均F1分数和0.7863的微平均F1分数。我们还在一个基准社会健康决定因素(SBDH)数据集上进行了额外实验,以证明我们方法的泛化能力。结论与未来工作:KIRESH-Prompt显著改进了驱逐状态分类。我们计划将KIRESH-Prompt部署到VHA的EHR系统中,作为驱逐监测系统,以帮助解决美国退伍军人的住房不稳定问题。