Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving and sharing. However, existing approaches tend to lack interoperable adoption across workflow management systems. In this work we present Workflow Run RO-Crate, an extension of RO-Crate (Research Object Crate) and Schema.org to capture the provenance of the execution of computational workflows at different levels of granularity and bundle together all their associated products (inputs, outputs, code, etc.). The model is supported by a diverse, open community that runs regular meetings, discussing development, maintenance and adoption aspects. Workflow Run RO-Crate is already implemented by several workflow management systems, allowing interoperable comparisons between workflow runs from heterogeneous systems. We describe the model, its alignment to standards such as W3C PROV, and its implementation in six workflow systems. Finally, we illustrate the application of Workflow Run RO-Crate in two use cases of machine learning in the digital image analysis domain. A corresponding RO-Crate for this article is at https://w3id.org/ro/doi/10.5281/zenodo.10368989
翻译:记录科学计算结果来源是支持数据产品的可追溯性、可重复性和质量评估的关键。已有多类数据模型被探索用于满足这一需求,它们提供了工作流计划及其执行的表示方式,以及将结果信息打包存档和共享的方法。然而,现有方法往往缺乏跨工作流管理系统的互操作性采用。本研究提出了Workflow Run RO-Crate,作为RO-Crate(研究对象包)和Schema.org的扩展,用于捕获计算工作流执行过程在不同粒度层面上的来源信息,并将所有相关产物(输入、输出、代码等)打包在一起。该模型由一个多元化的开放社区支持,该社区定期召开会议,讨论开发、维护和采用等方面的问题。Workflow Run RO-Crate已被多个工作流管理系统实现,支持对异构系统工作流运行结果的互操作性比较。我们描述了该模型、其与W3C PROV等标准的对齐方式,以及其在六个工作流系统中的实现。最后,我们通过数字图像分析领域的两个机器学习应用案例展示了Workflow Run RO-Crate的实际应用。本文对应的RO-Crate位于 https://w3id.org/ro/doi/10.5281/zenodo.10368989