Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear reporting standards for ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist ($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience). It consists of 32 questions and a paired set of guidelines. REFORMS was developed based on a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
翻译:机器学习(ML)方法在科学研究中日益普及。然而,这些方法的采用伴随着有效性、可重复性和泛化性方面的失败。这些失败可能阻碍科学进步,导致围绕无效主张形成虚假共识,并削弱基于机器学习的科学的可信度。ML方法往往以跨学科相似的方式被应用并失败。受此观察启发,我们的目标是为基于ML的科学提供清晰的报告标准。通过广泛回顾过往文献,我们提出了REFORMS清单($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience)。该清单包含32个问题及一套配套指南。REFORMS基于计算机科学、数据科学、数学、社会科学和生物医学科学领域共19位研究者的共识制定。REFORMS可作为研究人员设计实施研究、审稿人评审论文、以及期刊执行透明性和可重复性标准时的参考资源。