Exponential growth in heterogeneous healthcare data arising from electronic health records (EHRs), medical imaging, wearable sensors, and biomedical research has accelerated the adoption of data lakes and centralized architectures capable of handling the Volume, Variety, and Velocity of Big Data for advanced analytics. However, without effective governance, these repositories risk devolving into disorganized data swamps. Ontology-driven semantic data management offers a robust solution by linking metadata to healthcare knowledge graphs, thereby enhancing semantic interoperability, improving data discoverability, and enabling expressive, domain-aware access. This review adopts a systematic research strategy, formulating key research questions and conducting a structured literature search across major academic databases, with selected studies analyzed and classified into six categories of ontology-driven healthcare analytics: (i) ontology-driven integration frameworks, (ii) semantic modeling for metadata enrichment, (iii) ontology-based data access (OBDA), (iv) basic semantic data management, (v) ontology-based reasoning for decision support, and (vi) semantic annotation for unstructured data. We further examine the integration of ontology technologies with Big Data frameworks such as Hadoop, Spark, Kafka, and so on, highlighting their combined potential to deliver scalable and intelligent healthcare analytics. For each category, recent techniques, representative case studies, technical and organizational challenges, and emerging trends such as artificial intelligence, machine learning, the Internet of Things (IoT), and real-time analytics are reviewed to guide the development of sustainable, interoperable, and high-performance healthcare data ecosystems.
翻译:电子健康记录(EHRs)、医学影像、可穿戴传感器及生物医学研究产生的异构医疗数据呈指数级增长,这加速了数据湖及集中式架构的采用,这些架构能够处理大数据分析所需的数据体量、多样性和速度。然而,若无有效治理,这些存储库可能退化为杂乱无章的数据沼泽。本体驱动的语义数据管理通过将元数据链接至医疗健康知识图谱,提供了一种稳健的解决方案,从而增强了语义互操作性,提高了数据可发现性,并实现了富有表现力且具备领域感知的访问。本综述采用系统性的研究策略,提出了关键研究问题,并在主要学术数据库中进行了结构化文献检索,将筛选出的研究分析并归类为本体驱动医疗健康分析的六大类别:(i)本体驱动的集成框架,(ii)用于元数据丰富的语义建模,(iii)基于本体的数据访问(OBDA),(iv)基础语义数据管理,(v)基于本体的决策支持推理,以及(vi)面向非结构化数据的语义标注。我们进一步探讨了本体技术与Hadoop、Spark、Kafka等大数据框架的集成,强调了它们结合在提供可扩展且智能的医疗健康分析方面的潜力。针对每个类别,本文回顾了最新技术、代表性案例研究、技术与组织挑战,以及人工智能、机器学习、物联网(IoT)和实时分析等新兴趋势,以指导可持续、可互操作且高性能的医疗健康数据生态系统的开发。