Digital twins -- virtual replicas of physical entities -- are gaining traction in healthcare for personalized monitoring, predictive modeling, and clinical decision support. However, generating interoperable patient digital twins from unstructured electronic health records (EHRs) remains challenging due to variability in clinical documentation and lack of standardized mappings. This paper presents a semantic NLP-driven pipeline that transforms free-text EHR notes into FHIR-compliant digital twin representations. The pipeline leverages named entity recognition (NER) to extract clinical concepts, concept normalization to map entities to SNOMED-CT or ICD-10, and relation extraction to capture structured associations between conditions, medications, and observations. Evaluation on MIMIC-IV Clinical Database Demo with validation against MIMIC-IV-on-FHIR reference mappings demonstrates high F1-scores for entity and relation extraction, with improved schema completeness and interoperability compared to baseline methods.
翻译:数字孪生——物理实体的虚拟副本——正在医疗保健领域获得关注,用于个性化监测、预测建模和临床决策支持。然而,由于临床文档的变异性以及标准化映射的缺乏,从非结构化电子健康记录生成可互操作的患者数字孪生仍然具有挑战性。本文提出了一种语义自然语言处理驱动的流程,将自由文本电子健康记录笔记转换为符合FHIR标准的数字孪生表示。该流程利用命名实体识别提取临床概念,通过概念归一化将实体映射至SNOMED-CT或ICD-10编码体系,并借助关系抽取捕捉疾病、药物与观察指标之间的结构化关联。在MIMIC-IV临床数据库演示集上的评估表明,基于MIMIC-IV-on-FHIR参考映射的验证结果在实体与关系抽取任务中均取得较高的F1分数,相较于基线方法显著提升了模式完整性与互操作性。