Research is facing a reproducibility crisis, in which the results and findings of many studies are difficult or even impossible to reproduce. This is also the case in machine learning (ML) and artificial intelligence (AI) research. Often, this is the case due to unpublished data and/or source-code, and due to sensitivity to ML training conditions. Although different solutions to address this issue are discussed in the research community such as using ML platforms, the level of reproducibility in ML-driven research is not increasing substantially. Therefore, in this mini survey, we review the literature on reproducibility in ML-driven research with three main aims: (i) reflect on the current situation of ML reproducibility in various research fields, (ii) identify reproducibility issues and barriers that exist in these research fields applying ML, and (iii) identify potential drivers such as tools, practices, and interventions that support ML reproducibility. With this, we hope to contribute to decisions on the viability of different solutions for supporting ML reproducibility.
翻译:研究正面临可重复性危机,即许多研究的结果与发现难以甚至无法复现,这一问题同样存在于机器学习与人工智能研究领域。造成该现象的原因通常包括:未公开的数据和/或源代码,以及对机器学习训练条件的敏感性。尽管学术界已提出多种解决方案(如使用机器学习平台),但机器学习驱动研究的可重复性水平并未实现显著提升。为此,本小型综述通过文献梳理,围绕机器学习驱动研究的可重复性展开三项核心目标:(i)纵览不同研究领域中机器学习可重复性的现状;(ii)识别应用机器学习的研究领域存在的可重复性障碍与问题;(iii)探寻支持机器学习可重复性的潜在驱动力,包括工具、实践方案与干预措施。我们期望通过此研究,为评估不同解决方案在支持机器学习可重复性方面的可行性提供决策依据。