Objective: Integrating EHR data with other resources is essential in rare disease research due to low disease prevalence. Such integration is dependent on the alignment of ontologies used for data annotation. The International Classification of Diseases (ICD) is used to annotate clinical diagnoses; the Human Phenotype Ontology (HPO) to annotate phenotypes. Although these ontologies overlap in biomedical entities described, the extent to which they are interoperable is unknown. We investigate how well aligned these ontologies are and whether such alignments facilitate EHR data integration. Materials and Methods: We conducted an empirical analysis of the coverage of mappings between ICD and HPO. We interpret this mapping coverage as a proxy for how easily clinical data can be integrated with research ontologies such as HPO. We quantify how exhaustively ICD codes are mapped to HPO by analyzing mappings in the UMLS Metathesaurus. We analyze the proportion of ICD codes mapped to HPO within a real-world EHR dataset. Results and Discussion: Our analysis revealed that only 2.2% of ICD codes have direct mappings to HPO in UMLS. Within our EHR dataset, less than 50% of ICD codes have mappings to HPO terms. ICD codes that are used frequently in EHR data tend to have mappings to HPO; ICD codes that represent rarer medical conditions are seldom mapped. Conclusion: We find that interoperability between ICD and HPO via UMLS is limited. While other mapping sources could be incorporated, there are no established conventions for what resources should be used to complement UMLS.
翻译:目的:在罕见病研究中,由于疾病患病率较低,将电子健康记录(EHR)数据与其他资源整合至关重要。这种整合依赖于用于数据注释的本体之间的对齐。国际疾病分类(ICD)用于注释临床诊断;人类表型本体(HPO)用于注释表型。尽管这些本体在描述的生物医学实体上存在重叠,但其互操作性程度尚不明确。我们研究了这些本体的对齐程度,以及此类对齐是否有助于EHR数据整合。材料与方法:我们对ICD与HPO之间映射的覆盖范围进行了实证分析。我们将此映射覆盖范围解释为临床数据与HPO等研究本体整合难易程度的代理指标。通过分析UMLS Metathesaurus中的映射,我们量化了ICD编码映射到HPO的详尽程度。我们在一个真实世界的EHR数据集中分析了映射到HPO的ICD编码比例。结果与讨论:我们的分析显示,在UMLS中仅有2.2%的ICD编码与HPO存在直接映射。在我们的EHR数据集中,不到50%的ICD编码具有到HPO术语的映射。EHR数据中频繁使用的ICD编码往往具有到HPO的映射;而代表较罕见医疗状况的ICD编码则很少被映射。结论:我们发现通过UMLS实现的ICD与HPO之间的互操作性有限。虽然可以纳入其他映射来源,但目前尚无关于应使用哪些资源来补充UMLS的既定规范。