Why are some research studies easy to reproduce while others are difficult? Casting doubt on the accuracy of scientific work is not fruitful, especially when an individual researcher cannot reproduce the claims made in the paper. There could be many subjective reasons behind the inability to reproduce a scientific paper. The field of Machine Learning (ML) faces a reproducibility crisis, and surveying a portion of published articles has resulted in a group realization that although sharing code repositories would be appreciable, code bases are not the end all be all for determining the reproducibility of an article. Various parties involved in the publication process have come forward to address the reproducibility crisis and solutions such as badging articles as reproducible, reproducibility checklists at conferences (\textit{NeurIPS, ICML, ICLR, etc.}), and sharing artifacts on \textit{OpenReview} come across as promising solutions to the core problem. The breadth of literature on reproducibility focuses on measures required to avoid ir-reproducibility, and there is not much research into the effort behind reproducing these articles. In this paper, we investigate the factors that contribute to the easiness and difficulty of reproducing previously published studies and report on the foundational framework to quantify effort of reproducibility.
翻译:为什么有些研究易于复现,而另一些则困难重重?对科学工作的准确性提出质疑并非有益之举,尤其是在个别研究者无法复现论文中所述主张的情况下。无法复现科学论文的背后可能存在诸多主观原因。机器学习领域正面临可重复性危机,对已发表文章进行抽样调查后,学界逐渐认识到:尽管共享代码仓库值得赞赏,但代码库并非评判文章可复现性的唯一标准。出版流程中的多方利益相关者已着手应对可重复性危机,诸如将文章标记为可复现、在会议(如NeurIPS、ICML、ICLR等)设置可复现性检查清单,以及在OpenReview平台共享研究工件等方案,正成为解决核心问题的可行路径。关于可复现性的现有文献主要聚焦于避免不可复现所需的措施,而对复现这些文章所需投入的研究甚少。本文探究了影响既往已发表研究复现难易程度的因素,并报告了量化可重复性工作量的基础框架。