The rising popularity of deep learning (DL) methods and techniques has invigorated interest in the topic of SE4DL (Software Engineering for Deep Learning), the application of software engineering (SE) practices on deep learning software. Despite the novel engineering challenges brought on by the data-driven and non-deterministic paradigm of DL software, little work has been invested into developing DL-targeted SE tools. On the other hand, tools tackling non-SE issues specific to DL are actively used and referred to under the umbrella term "MLOps (Machine Learning Operations) tools". Nevertheless, the available literature supports the utility of conventional SE tooling in DL software development. Building upon previous mining software repositories (MSR) research on tool usage in open-source software works, we identify conventional and MLOps tools adopted in popular applied DL projects that use Python as the main programming language. About 63\% of the GitHub repositories we examined contained at least one conventional SE tool. Software construction tools are the most widely adopted, while the opposite applies to management and maintenance tools. Relatively few MLOps tools were found to be use, with only 20 tools out of a sample of 74 used in at least one repository. The majority of them were open-source rather than proprietary. One of these tools, TensorBoard, was found to be adopted in about half of the repositories in our study. Consequently, the widespread use of conventional SE tooling demonstrates its relevance to DL software. Further research is recommended on the adoption of MLOps tooling, focusing on the relevance of particular tool types, the development of required tools, as well as ways to promote the use of already available tools.
翻译:深度学习(DL)方法与技术的日益普及,重新激发了人们对SE4DL(面向深度学习的软件工程)这一主题的兴趣,即软件工程(SE)实践在深度学习软件中的应用。尽管深度学习软件的数据驱动和非确定性范式带来了新颖的工程挑战,但针对开发DL专用SE工具的工作投入甚少。另一方面,解决DL特有非SE问题的工具被积极使用,并被统称为"MLOps(机器学习运维)工具"。然而,现有文献支持传统SE工具在DL软件开发中的实用性。基于先前关于开源软件工作中工具使用的软件仓库挖掘(MSR)研究,我们识别了在流行的、以Python为主要编程语言的应用型DL项目中采用的传统SE工具和MLOps工具。在我们检查的GitHub仓库中,约63%包含至少一种传统SE工具。软件构建工具采用最为广泛,而管理和维护工具的情况则相反。相对较少的MLOps工具被发现得到使用,在74个样本工具中,仅有20个在至少一个仓库中被使用。其中大多数是开源工具而非专有工具。其中一个工具TensorBoard,在我们的研究中被发现被约一半的仓库所采用。因此,传统SE工具的广泛使用证明了其对DL软件的相关性。建议进一步研究MLOps工具的采用,重点关注特定工具类型的相关性、所需工具的开发,以及促进现有工具使用的方法。