The rising popularity of deep learning (DL) methods and techniques has invigorated interest in the topic of SE4DL, the application of software engineering (SE) practices on deep learning software. Despite the novel engineering challenges brought on by the data-driven and non-deterministic paradigm of DL software, little work has been invested into developing AI-targeted SE tools. On the other hand, tools tackling more general engineering issues in DL are actively used and referred to under the umbrella term of ``MLOps tools''. Furthermore, the available literature supports the utility of conventional SE tooling in DL software development. Building upon previous MSR research on tool usage in open-source software works, we identify conventional and MLOps tools adopted in popular applied DL projects that use Python as the main programming language. About 70% of the GitHub repositories mined contained at least one conventional SE tool. Software configuration management tools are the most adopted, while the opposite applies to maintenance tools. Substantially fewer MLOps tools were in use, with only 9 tools out of a sample of 80 used in at least one repository. The majority of them were open-source rather than proprietary. One of these tools, TensorBoard, was found to be adopted in about half of the repositories in our study. Consequently, the use of conventional SE tooling demonstrates its relevance to DL software. Further research is recommended on the adoption of MLOps tooling by open-source projects, focusing on the relevance of particular tool types, the development of required tools, as well as ways to promote the use of already available tools.
翻译:深度学习(DL)方法与技术的日益普及激发了人们对SE4DL(将软件工程(SE)实践应用于深度学习软件)这一主题的研究兴趣。尽管DL软件的数据驱动和非确定性范式带来了新的工程挑战,但针对人工智能的SE工具开发工作却鲜有投入。另一方面,那些解决DL中更通用工程问题的工具被积极使用,并统称为“MLOps工具”。此外,现有文献支持传统SE工具在DL软件开发中的实用性。基于先前关于开源软件工具使用的MSR研究成果,本研究识别了以Python为主要编程语言的流行应用型DL项目中采用的传统工具和MLOps工具。在所挖掘的GitHub仓库中,约70%包含至少一种传统SE工具。软件配置管理工具采用率最高,而维护工具采用率最低。MLOps工具的使用显著较少,在80种工具的样本中仅有9种被至少一个仓库使用,且其中大部分为开源而非专有工具。研究发现,约半数的研究仓库采用了TensorBoard这一工具。因此,传统SE工具的使用证明了其与DL软件的相关性。建议进一步研究开源项目对MLOps工具的采用情况,重点关注特定工具类型的相关性、所需工具的开发,以及推广现有工具使用的方法。