Reproducibility is one of the core dimensions that concur to deliver Trustworthy Artificial Intelligence. Broadly speaking, reproducibility can be defined as the possibility to reproduce the same or a similar experiment or method, thereby obtaining the same or similar results as the original scientists. It is an essential ingredient of the scientific method and crucial for gaining trust in relevant claims. A reproducibility crisis has been recently acknowledged by scientists and this seems to affect even more Artificial Intelligence and Machine Learning, due to the complexity of the models at the core of their recent successes. Notwithstanding the recent debate on Artificial Intelligence reproducibility, its practical implementation is still insufficient, also because many technical issues are overlooked. In this survey, we critically review the current literature on the topic and highlight the open issues. Our contribution is three-fold. We propose a concise terminological review of the terms coming into play. We collect and systematize existing recommendations for achieving reproducibility, putting forth the means to comply with them. We identify key elements often overlooked in modern Machine Learning and provide novel recommendations for them. We further specialize these for two critical application domains, namely the biomedical and physical artificial intelligence fields.
翻译:可复现性是构建可信人工智能的核心维度之一。广义而言,可复现性可被定义为复现相同或相似实验或方法,从而获得与原始研究者相同或相似结果的可能性。作为科学方法的基本要素,它对于建立对相关论断的信任至关重要。近年来科学界已普遍意识到可复现性危机,而由于近期重大突破所依赖的模型复杂度,人工智能与机器学习领域受此影响尤为显著。尽管业界已就人工智能可复现性展开广泛讨论,但实际实施仍显不足,部分原因在于许多技术细节尚未得到充分重视。在本综述中,我们对该领域的现有文献进行了批判性审视,并重点揭示了尚未解决的开放性问题。本文贡献体现在三个方面:首先,我们提出了一套精简的术语体系,厘清相关概念间的逻辑关联;其次,我们系统梳理并整合了现有可复现性实践建议,提出了相应的实现路径;再次,我们识别出当前机器学习研究中常被忽视的关键要素,并针对性地提出了创新性建议。最后,我们将这些建议进一步适配到两个关键应用领域:生物医学人工智能与物理人工智能。