Background: Electronic health records (EHRs) enable machine learning for diagnosis, prognosis, and clinical decision support. However, EHR standards vary by country and hospital, making records often incompatible. This limits large-scale and cross-clinical machine learning. To address such complexity, a metadata repository cataloguing available data elements, their value domains, and their compatibility is an essential tool. This allows researchers to leverage relevant data for tasks such as identifying undiagnosed rare disease patients. Results: Within the Screen4Care project, we developed S4CMDR, an open-source metadata repository built on ISO 11179-3, based on a middle-out metadata standardisation approach. It automates cataloguing to reduce errors and enable the discovery of compatible feature sets across data registries. S4CMDR supports on-premise Linux deployment and cloud hosting, with state-of-the-art user authentication and an accessible interface. Conclusions: S4CMDR is a clinical metadata repository registering and discovering compatible EHR records. Novel contributions include a microservice architecture, a middle-out standardisation approach, and a user-friendly interface for error-free data registration and visualisation of metadata compatibility. We validate S4CMDR's case studies involving rare disease patients. We invite clinical data holders to populate S4CMDR using their metadata to validate the generalisability and support further development.
翻译:背景:电子健康记录(EHRs)使机器学习能够应用于诊断、预后和临床决策支持。然而,不同国家和医院的电子健康记录标准各异,导致记录通常不兼容。这限制了大规模和跨临床的机器学习应用。为解决这一复杂性,构建一个编录可用数据元素、其值域及兼容性的元数据仓库至关重要。该工具使研究人员能够利用相关数据执行诸如识别未确诊罕见病患者等任务。结果:在Screen4Care项目中,我们开发了S4CMDR,这是一个基于ISO 11179-3标准、采用中-出式元数据标准化方法的开源元数据仓库。它通过自动化编录减少错误,并支持跨数据注册表发现兼容特征集。S4CMDR支持本地Linux部署和云托管,配备先进用户认证系统及可访问界面。结论:S4CMDR是一个用于注册和发现兼容电子健康记录记录的临床元数据仓库。其创新贡献包括微服务架构、中-出式标准化方法,以及支持无错误数据注册和元数据兼容性可视化的用户友好界面。我们通过涉及罕见病患者的案例研究验证了S4CMDR。我们诚邀临床数据持有者使用他们的元数据填充S4CMDR,以验证其泛化能力并支持进一步开发。