Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.
翻译:深度学习(DL)模型发展迅速,其研究焦点在于通过测试模型准确率和鲁棒性来追求高性能。然而,当需要像其他软件系统一样对待并测试DL项目时,尚不清楚这些作为软件系统的项目是否得到了充分测试或功能正确性验证。为此,我们对开源DL项目中的单元测试进行了实证研究,分析了来自GitHub的9,129个项目。研究发现:1)有单元测试的DL项目与开源项目度量指标呈正相关,且拉取请求接受率更高;2)68%的采样DL项目完全没有单元测试;3)DL模型的层和工具模块(utils)单元测试覆盖率最高。基于这些发现与前期研究成果,我们构建了DL项目中单元测试与缺陷之间的映射分类体系。我们讨论了研究结果对开发者和研究人员的启示,并强调开源DL项目需要引入单元测试以确保其可靠性与稳定性。本研究通过提升社区对DL项目单元测试重要性的认知,推动该领域的进一步研究。