Automated Identification of Eviction Status from Electronic Health Record Notes

Objective: Evictions are important social and behavioral determinants of health. Evictions are associated with a cascade of negative events that can lead to unemployment, housing insecurity/homelessness, long-term poverty, and mental health problems. In this study, we developed a natural language processing system to automatically detect eviction status from electronic health record (EHR) notes. Materials and Methods: We first defined eviction status (eviction presence and eviction period) and then annotated eviction status in 5000 EHR notes from the Veterans Health Administration (VHA). We developed a novel model, KIRESH, that has shown to substantially outperform other state-of-the-art models such as fine-tuning pre-trained language models like BioBERT and BioClinicalBERT. Moreover, we designed a novel prompt to further improve the model performance by using the intrinsic connection between the two sub-tasks of eviction presence and period prediction. Finally, we used the Temperature Scaling-based Calibration on our KIRESH-Prompt method to avoid over-confidence issues arising from the imbalance dataset. Results: KIRESH-Prompt substantially outperformed strong baseline models including fine-tuning the BioClinicalBERT model to achieve 0.74672 MCC, 0.71153 Macro-F1, and 0.83396 Micro-F1 in predicting eviction period and 0.66827 MCC, 0.62734 Macro-F1, and 0.7863 Micro-F1 in predicting eviction presence. We also conducted additional experiments on a benchmark social determinants of health (SBDH) dataset to demonstrate the generalizability of our methods. Conclusion and Future Work: KIRESH-Prompt has substantially improved eviction status classification. We plan to deploy KIRESH-Prompt to the VHA EHRs as an eviction surveillance system to help address the US Veterans' housing insecurity.

翻译：目的：驱逐是重要的社会和行为健康决定因素。驱逐与一系列负面事件相关，可能导致失业、住房不稳定/无家可归、长期贫困以及心理健康问题。在本研究中，我们开发了一个自然语言处理系统，用于从电子健康记录（EHR）笔记中自动检测驱逐状态。材料与方法：我们首先定义了驱逐状态（驱逐存在性和驱逐时期），随后在来自退伍军人健康管理局（VHA）的5000份EHR笔记中对驱逐状态进行了标注。我们开发了一种名为KIRESH的新型模型，该模型显著优于其他现有最优模型，例如基于微调的预训练语言模型（如BioBERT和BioClinicalBERT）。此外，我们设计了一种新颖的提示，利用驱逐存在性和时期预测两个子任务之间的内在联系，进一步提升了模型性能。最后，我们在KIRESH-Prompt方法上应用了基于温度缩放的校准，以解决因数据集不平衡导致的过度自信问题。结果：KIRESH-Prompt显著优于强基线模型（包括微调BioClinicalBERT模型），在驱逐时期预测中达到0.74672的马修斯相关系数（MCC）、0.71153的宏平均F1分数和0.83396的微平均F1分数；在驱逐存在性预测中达到0.66827的MCC、0.62734的宏平均F1分数和0.7863的微平均F1分数。我们还在一个基准社会健康决定因素（SBDH）数据集上进行了额外实验，以证明我们方法的泛化能力。结论与未来工作：KIRESH-Prompt显著改进了驱逐状态分类。我们计划将KIRESH-Prompt部署到VHA的EHR系统中，作为驱逐监测系统，以帮助解决美国退伍军人的住房不稳定问题。