The application of machine learning on healthcare data is often hindered by the lack of standardized and semantically explicit representation, leading to limited interoperability and reproducibility across datasets and experiments. The Medical Event Data Standard (MEDS) addresses these issues by introducing a minimal, event-centric data model designed for reproducible machine-learning workflows from health data. However, MEDS is defined as a data-format specification and does not natively provide integration with the Semantic Web ecosystem. In this article, we introduce MEDS-OWL, a lightweight OWL ontology that provides formal concepts and relations to represent MEDS datasets as RDF graphs. Additionally, we implemented meds2rdf, a Python conversion library that transforms MEDS events into RDF graphs, ensuring conformance with the ontology. We evaluate the proposed approach on two datasets: a synthetic clinical cohort describing care pathways for ruptured intracranial aneurysms, and a real-world subset of MIMIC-IV. To assess semantic consistency, we performed a SHACL validation against the resulting knowledge graphs. The first release of MEDS-OWL comprises 13 classes, 10 object properties, 20 data properties, and 24 OWL axioms. Combined with meds2rdf, it enables data transformation into FAIR-aligned datasets, provenance-aware publishing, and interoperability of event-based clinical data. By bridging MEDS with the Semantic Web, this work contributes a reusable semantic layer for event-based clinical data and establishes a robust foundation for subsequent graph-based analytics.
翻译:机器学习在医疗健康数据上的应用常因缺乏标准化和语义明确表示而受阻,导致跨数据集和实验的互操作性及可复现性受限。医疗事件数据标准(MEDS)通过引入一个极简的、以事件为中心的数据模型来解决这些问题,该模型专为从健康数据中实现可复现的机器学习工作流而设计。然而,MEDS被定义为一种数据格式规范,本身未提供与语义网生态系统的集成。本文介绍MEDS-OWL,一种轻量级OWL本体,它提供形式化概念与关系,用于将MEDS数据集表示为RDF图。此外,我们实现了meds2rdf,一个Python转换库,可将MEDS事件转化为RDF图,并确保其符合本体规范。我们在两个数据集上评估了所提方法:一个描述颅内动脉瘤破裂诊疗路径的合成临床队列,以及MIMIC-IV的真实世界子集。为评估语义一致性,我们对生成的知识图谱进行了SHACL验证。MEDS-OWL的首个版本包含13个类、10个对象属性、20个数据属性和24条OWL公理。结合meds2rdf,它能够将数据转化为符合FAIR原则的数据集,实现溯源感知的数据发布以及基于事件的临床数据的互操作。通过将MEDS与语义网连接,本工作为基于事件的临床数据提供了一个可复用的语义层,并为后续基于图的分析奠定了坚实基础。