Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity. Identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no International Classification of Disease (ICD) code and the only place IDU information can be indicated are unstructured free-text clinical progress notes. Although natural language processing (NLP) can efficiently extract this information from unstructured data, there are no validated tools. To address this gap in clinical information, we design and demonstrate a question-answering (QA) framework to extract information on IDU from clinical progress notes. Unlike other methods discussed in the literature, the QA model is able to extract various types of information without being constrained by predefined entities, relations, or concepts. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) developing and testing the QA model. This paper also demonstrates the QA model's ability to extract IDU-related information on temporally out-of-distribution data. The results indicate that the majority (51%) of the extracted information by the QA model exactly matches the gold-standard answer and 73% of them contain the gold-standard answer with some additional surrounding words.
翻译:注射吸毒(IDU)是一种增加死亡率和发病率的危险健康行为。早期识别IDU并采取减害干预措施可使高危人群受益。然而,从患者电子健康记录(EHR)中提取IDU行为存在困难,因为缺乏国际疾病分类(ICD)编码,且IDU信息仅能记录在非结构化的自由文本临床病程笔记中。尽管自然语言处理(NLP)技术可从非结构化数据中高效提取此类信息,但目前尚无经过验证的工具。为填补这一临床信息缺口,我们设计并展示了一种基于问答(QA)框架的方法,用于从临床病程笔记中提取IDU相关信息。与文献中讨论的其他方法不同,该问答模型不受预定义实体、关系或概念的约束,能够提取多种类型的信息。我们的框架包含两个主要步骤:(1)生成金标准问答数据集;(2)开发并测试问答模型。本文还展示了问答模型对时间分布外数据中IDU相关信息的提取能力。结果表明,问答模型提取的信息中51%与金标准答案完全匹配,73%包含金标准答案并附加少量周围词语。