Increased reproducibility of machine learning research has been a driving force for dramatic improvements in learning performances. The scientific community further fosters this effort by including reproducibility ratings in reviewer forms and considering them as a crucial factor for the overall evaluation of papers. Accompanying source code is not sufficient to make a work reproducible. The shared codes should meet the ML reproducibility checklist as well. This work aims to support reproducibility evaluations of papers with source codes. We propose an end-to-end system that operates on the Readme file of the source code repositories. The system checks the compliance of a given Readme to a template proposed by a widely used platform for sharing source codes of research. Our system generates scores based on a custom function to combine section scores. We also train a hierarchical transformer model to assign a class label to a given Readme. The experimental results show that the section similarity-based system performs better than the hierarchical transformer. Moreover, it has an advantage regarding explainability since one can directly relate the score to the sections of Readme files.
翻译:机器学习研究的可复现性提升已成为推动学习性能大幅改进的关键动力。科学界通过在审稿表格中纳入可复现性评级,并将其视为论文整体评价的关键因素,进一步强化了这一努力。仅提供源代码并不足以确保研究的可复现性,共享的代码还需符合机器学习可复现性检查清单。本研究旨在支持对附有源代码的论文进行可复现性评估。我们提出一个端到端系统,该系统通过分析源代码仓库的Readme文件运作:首先验证给定Readme是否符合某广泛使用的研究代码共享平台提出的模板规范;其次基于自定义函数对各章节得分进行加权计算并生成综合评分;同时训练层次化Transformer模型对Readme文件进行类别标签分配。实验结果表明,基于章节相似度的系统性能优于层次化Transformer模型,且由于其评分可直接关联至Readme文件各章节,在可解释性方面具有显著优势。