Although repeatability and reproducibility are essential in science, failed attempts to replicate results across diverse fields made some scientists argue for a reproducibility crisis. In response, several high-profile venues within computing established artifact evaluation tracks, a systematic procedure for evaluating and badging research artifacts, with an increasing number of artifacts submitted. This study compiles recent artifact evaluation procedures and guidelines to show how artifact evaluation in distributed systems research lags behind other computing disciplines and/or is less unified and more complex. We further argue that current artifact assessment criteria are uncoordinated and insufficient for the unique challenges of distributed systems research. We examine the current state of the practice for artifacts and their evaluation to provide recommendations to assist artifact authors, reviewers, and track chairs. We summarize the recommendations and best practices as checklists for artifact authors and evaluation committees. Although our recommendations alone will not resolve the repeatability and reproducibility crisis, we want to start a discussion in our community to increase the number of submitted artifacts and their quality over time.
翻译:尽管可重复性与可重现性是科学研究的基本要求,但多领域内复制实验结果失败的案例使得部分学者认为当前存在可重现性危机。作为回应,计算机科学领域的多个知名会议设立了制品评估机制——一种用于评估研究制品并授予认证标识的系统化流程,且提交评估的制品数量持续增长。本研究通过梳理近期的制品评估流程与指导方针,揭示分布式系统研究领域的制品评估如何落后于其他计算学科,并/或呈现出更缺乏统一性、更复杂的现状。我们进一步指出,当前制品评估标准缺乏协调性,且不足以应对分布式系统研究特有的挑战。通过考察制品及其评估的实践现状,我们为制品作者、评审人员及评估委员会主席提供了一系列改进建议。我们将这些建议与最佳实践总结为面向制品作者和评估委员会的核查清单。尽管仅靠这些建议无法完全解决可重复性与可重现性危机,但我们期望以此引发学术社区的讨论,从而逐步提升提交制品的数量与质量。