The FAIR (Findable, Accessible, Interoperable, and Reusable) data principles [1] promote the interoperability of scientific data by encouraging the use of persistent identifiers, standardized vocabularies, and formal metadata structures. Many resources are created using vocabularies that are FAIR-compliant and well-annotated, yet the collective ecosystem of these resources often fails to interoperate effectively in practice. This continued challenge is mainly due to variation in identifier schemas and data models used in these resources. We have created two tools to bridge the chasm between interoperability in principle and interoperation in practice. Babel solves the problem of multiple identifier schemes by producing a curated set of identifier mappings to create cliques of equivalent identifiers that are exposed through high-performance APIs. ORION solves the problems of multiple data models by ingesting knowledge bases and transforming them into a common, community-managed data model. Here, we describe Babel and ORION and demonstrate their ability to support data interoperation. A library of fully interoperable knowledge bases created through the application of Babel and ORION is available for download and use at https://robokop.renci.org.
翻译:FAIR(可发现、可访问、可互操作、可重用)数据原则[1]通过鼓励使用持久标识符、标准化词汇表和规范化元数据结构,旨在促进科学数据的互操作性。尽管许多资源在创建时采用了符合FAIR原则且标注完善的词汇表,但这些资源构成的整体生态系统在实践中往往无法实现有效互通。这一持续存在的挑战主要源于不同资源间标识符体系与数据模型的差异。为弥合理论互操作与实践互通之间的鸿沟,我们开发了两款工具。Babel通过构建经人工校验的标识符映射集,生成等价标识符簇合并通过高性能API对外提供,从而解决多标识符体系的问题。ORION通过导入知识库并将其转换为统一的社区管理数据模型,以应对多数据模型带来的挑战。本文详细介绍了Babel与ORION,并论证了二者支持数据互通的能力。通过应用这两款工具构建的完全可互操作知识库集合已发布于https://robokop.renci.org,可供下载使用。