Objective: Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: i) develop a natural language processing (NLP) information extraction model to capture detailed SDOH information and ii) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data. Materials and Methods: We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set with 225,089 patients and 430,406 notes with social history sections and compared the extracted SDOH information with existing structured data. Results: The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative. Conclusions: Utilizing EHR data to identify SDOH health risk factors and social needs may improve patient care and outcomes. Semantic representations of text-encoded SDOH information can augment existing structured data, and this more comprehensive SDOH representation can assist health systems in identifying and addressing these social needs.
翻译:目的:社会健康决定因素(SDOH)影响健康结果,并通过结构化数据和非结构化临床笔记记录在电子健康记录(EHR)中。然而,临床笔记通常包含更全面的SDOH信息,详细描述了状态、严重性和时间性等各个方面。本研究有两个主要目标:i)开发一种自然语言处理(NLP)信息提取模型以捕获详细的SDOH信息;ii)评估将SDOH提取器应用于临床叙述并将提取的表示与现有结构化数据相结合所实现的信息增益。材料与方法:我们利用深度学习实体和关系提取架构开发了一种新型SDOH提取器,以表征不同维度的SDOH。在一项EHR案例研究中,我们将SDOH提取器应用于一个包含225,089名患者和430,406份带有社会历史章节笔记的大型临床数据集,并将提取的SDOH信息与现有结构化数据进行了比较。结果:SDOH提取器在保留测试集上达到了0.86的F1分数。在EHR案例研究中,我们发现提取的SDOH信息补充了现有结构化数据:32%的无家可归患者、19%的当前烟草使用者以及10%的药物使用者仅在这些健康风险因素被记录在临床叙述中。结论:利用EHR数据识别SDOH健康风险因素和社会需求可能改善患者护理和结果。文本编码的SDOH信息的语义表示可以增强现有结构化数据,而这种更全面的SDOH表示有助于卫生系统识别和解决这些社会需求。