Computational reproducibility refers to obtaining consistent results when rerunning an experiment. Jupyter Notebook, a web-based computational notebook application, facilitates running, publishing, and sharing computational experiments along with their results. However, rerunning a Jupyter Notebook may not always generate identical results due to various factors, such as randomness, changes in library versions, or variations in the computational environment. This paper introduces the Similarity-based Reproducibility Index (SRI) -- a metric for assessing the reproducibility of results in Jupyter Notebooks. SRI employs novel methods developed based on similarity metrics specific to different types of Python objects to compare rerun outputs against original outputs. For every cell generating an output in a rerun notebook, SRI reports a quantitative score in the range [0, 1] as well as some qualitative insights to assess reproducibility. The paper also includes a case study in which the proposed metric is applied to a set of Jupyter Notebooks, demonstrating how various similarity metrics can be leveraged to quantify computational reproducibility.
翻译:计算可复现性指在重新运行实验时获得一致结果的能力。Jupyter Notebook作为一种基于Web的计算笔记本应用,为运行、发布和共享计算实验及其结果提供了便利。然而,由于随机性、库版本变更或计算环境差异等因素,重新运行Jupyter Notebook可能无法始终产生完全相同的结果。本文提出相似性可复现性指数(SRI)——一种用于评估Jupyter Notebook结果可复现性的度量指标。SRI采用基于不同类型Python对象相似性度量开发的新方法,将重新运行的输出与原始输出进行比较。对于重新运行笔记本中每个生成输出的单元,SRI会报告一个[0, 1]范围内的量化评分以及若干定性分析,以评估可复现性。本文还包含一项案例研究,将所提出的度量指标应用于一组Jupyter Notebook,展示了如何利用各种相似性度量来量化计算可复现性。