Cross-domain data integration drives interdisciplinary data reuse and knowledge transfer across domains. However, each discipline maintains its own metadata schemas and domain ontologies, employing distinct conceptual models and application profiles, which complicates semantic interoperability. The W3C Data Catalog Vocabulary (DCAT) offers a widely adopted RDF vocabulary for describing datasets and their distributions, but its core model is intentionally lightweight. Numerous domain-specific application profiles have emerged to enrich DCAT's expressivity, the most well-known DCAT-AP for public data. To facilitate cross-domain interoperability for research data, we propose DCAT-AP PLUS, a DCAT Application Profile (P)roviding additional (L)inks to (U)se-case (S)pecific context (DCAT-AP+). This generic application profile enables a comprehensive representation of the provenance and context of research data generation. DACT-AP+ introduces an upper-level layer that can be specialized by individual domains without sacrificing compatibility. We demonstrate the application of DCAT-AP+ and a specific profile ChemDCAT-AP to showcase the potential of data integration of the neighboring disciplines chemistry and catalysis. We adopt LinkML, a YAML-based modeling framework, to support schema inheritance, generate domain-specific subschemas, and provide mechanisms for data type harmonization, validation, and format conversion, ensuring smooth integration of DCAT-AP+ and ChemDCAT-AP within existing data infrastructures.
翻译:跨领域数据集成推动了跨学科的数据重用与领域间的知识迁移。然而,各学科领域均维护着各自的元数据模式与领域本体,采用不同的概念模型与应用纲要,这为语义互操作性带来了挑战。W3C数据目录词汇表(DCAT)提供了一个广泛采用的RDF词汇表,用于描述数据集及其分发,但其核心模型有意设计得较为轻量。为增强DCAT的表达能力,已涌现出众多领域特定的应用纲要,其中最著名的是面向公共数据的DCAT-AP。为促进研究数据的跨领域互操作性,我们提出了DCAT-AP PLUS,即一种为(U)用案例(S)特定上下文提供额外(L)链接的DCAT应用纲要(P)(DCAT-AP+)。这一通用应用纲要能够全面表征研究数据生成的溯源与上下文信息。DACT-AP+引入了一个上层结构,可在不牺牲兼容性的前提下由各领域进行特化。我们通过展示DCAT-AP+及其特定纲要ChemDCAT-AP的应用,以说明化学与催化这两个邻近学科领域数据集成的潜力。我们采用基于YAML的建模框架LinkML,以支持模式继承、生成领域特定的子模式,并提供数据类型协调、验证与格式转换机制,从而确保DCAT-AP+与ChemDCAT-AP在现有数据基础设施中的平滑集成。