Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear reporting standards for ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist ($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience). It consists of 32 questions and a paired set of guidelines. REFORMS was developed based on a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
翻译:机器学习(ML)方法在科学研究中的应用日益广泛。然而,这些方法的采用也伴随着有效性、可重复性和可推广性方面的失败。这些失败可能会阻碍科学进步,导致对无效主张的错误共识,并削弱基于ML的科学的可信度。ML方法往往在不同学科中以相似的方式被应用并失败。基于这一观察,我们的目标是提供清晰的基于机器学习的科学报告标准。通过广泛回顾以往文献,我们提出了REFORMS清单($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience)。该清单包含32个问题及配套的指南。REFORMS是基于计算机科学、数据科学、数学、社会科学和生物医学科学领域的19位研究人员的共识而制定的。REFORMS可作为研究人员设计和实施研究时的资源,审稿人审阅论文时的参考,以及期刊执行透明度和可重复性标准时的依据。