Open-sourcing research publications is a key enabler for the reproducibility of studies and the collective scientific progress of a research community. As all fields of science develop more advanced algorithms, we become more dependent on complex computational toolboxes -- sharing research ideas solely through equations and proofs is no longer sufficient to communicate scientific developments. Over the past years, several efforts have highlighted the importance and challenges of transparent and reproducible research; code sharing is one of the key necessities in such efforts. In this article, we study the impact of code release on scientific research and present statistics from three research communities: machine learning, robotics, and control. We found that, over a six-year period (2016-2021), the percentages of papers with code at major machine learning, robotics, and control conferences have at least doubled. Moreover, high-impact papers were generally supported by open-source codes. As an example, the top 1% of most cited papers at the Conference on Neural Information Processing Systems (NeurIPS) consistently included open-source codes. In addition, our analysis shows that popular code repositories generally come with high paper citations, which further highlights the coupling between code sharing and the impact of scientific research. While the trends are encouraging, we would like to continue to promote and increase our efforts toward transparent, reproducible research that accelerates innovation -- releasing code with our papers is a clear first step.
翻译:研究代码开源是保障研究可重复性及推动科研共同体集体进步的关键因素。随着各科学领域发展出更先进的算法,我们对复杂计算工具包的依赖日益加深——仅通过方程和证明分享研究思想已不足以充分传达科学发展。近年来,多项研究已强调透明可重复研究的重要性与挑战,而代码共享正是这类工作中的核心要素之一。本文从机器学习、机器人学和控制三个研究共同体出发,探讨代码开源对科学研究的影响并呈现相关统计数据。我们发现,在2016-2021这六年间,机器学习、机器人学和控制领域顶级会议中附代码论文的比例至少翻了一番。同时,高影响力论文普遍得到开源代码支持。以Neural Information Processing Systems(NeurIPS)会议为例,其引用量前1%的论文始终附有开源代码。此外,分析表明热门代码仓库通常对应着高引用论文,这进一步凸显了代码共享与科研影响力之间的紧密关联。尽管趋势令人振奋,我们仍需持续加强并拓展对透明可重复研究的投入——将代码与论文一同发布正是实现这一目标的关键第一步。