Deep learning has achieved recognition for its impact within natural sciences, yet the prohibitive financial and technical cost of training models from scratch inhibit adoption. Following software engineering community guidance, natural scientists are reusing pre-trained deep learning models (PTMs) to amortize these costs. While prior works recommend PTM reuse patterns, we present the first empirical study of PTM reuse patterns in the natural sciences, quantifying the utilization and impact of PTM reuse within the scientific process across 17,718 peer reviewed, open access papers. Our results show that "Biochemistry, Genetics and Molecular Biology" has outpaced other natural scientific fields in PTM reuse, "adaptation" reuse is the most prevalent PTM reuse pattern identified across all natural science fields, and the "testing" stage of the scientific process has been most impacted by PTM integration.
翻译:深度学习因其在自然科学领域的影响而受到认可,然而从头训练模型的财务和技术成本过高,阻碍了其广泛应用。遵循软件工程社区的建议,自然科学家正在复用预训练深度学习模型以分摊这些成本。尽管已有研究推荐了预训练模型的复用模式,但我们首次对自然科学中的预训练模型复用模式进行了实证研究,通过分析17,718篇经过同行评审的开放获取论文,量化了科学过程中预训练模型复用的使用情况及其影响。结果表明,“生物化学、遗传学与分子生物学”在预训练模型复用方面领先于其他自然科学领域,“适配”复用是所有自然科学领域中最常见的预训练模型复用模式,而科学过程中的“测试”阶段受预训练模型集成的影响最大。