DL frameworks are the basis of constructing all DL programs and models, and thus their bugs could lead to the unexpected behaviors of any DL program or model relying on them. Such a wide effect demonstrates the necessity and importance of guaranteeing DL frameworks' quality. Understanding the characteristics of DL framework bugs is a fundamental step for this quality assurance task, facilitating designing effective bug detection and debugging approaches. Hence, in this work we conduct the most large-scale study on 1,000 bugs from four popular and diverse DL frameworks (i.e., TensorFlow, PyTorch, MXNet, and DL4J). By analyzing the root causes and symptoms of DL framework bugs associated with 5 components decomposed from DL frameworks, as well as measuring test coverage achieved by three state-of-the-art testing techniques, we obtain 12 major findings for the comprehensive understanding of DL framework bugs and the current status of existing DL framework testing practice, and then provide a series of actionable guidelines for better DL framework bug detection and debugging. Finally, based on the guidelines, we design and implement a prototype DL-framework testing tool, called TenFuzz, which is evaluated to be effective and finds 3 unknown bugs on the latest TensorFlow framework in a preliminary study, indicating the significance of our guidelines.
翻译:深度学习框架是所有深度学习程序和模型构建的基础,因此其缺陷可能导致任何依赖它们的深度学习程序或模型出现意外行为。这种广泛影响证明了保障深度学习框架质量的必要性和重要性。理解深度学习框架缺陷的特征是实现这一质量保障任务的基本步骤,有助于设计有效的缺陷检测和调试方法。因此,在本研究中,我们开展了最大规模的研究,涉及来自四种流行且多样的深度学习框架(即TensorFlow、PyTorch、MXNet和DL4J)的1000个缺陷。通过分析深度学习框架中五个组件关联的缺陷根本原因和症状,并测量三种最先进测试技术所实现的测试覆盖率,我们获得了12项主要发现,用于全面理解深度学习框架缺陷及现有深度学习框架测试实践的当前状况,随后提供了一系列可操作的指南,以更好地进行深度学习框架缺陷检测和调试。最后,基于这些指南,我们设计并实现了一个名为TenFuzz的深度学习框架测试工具原型,初步研究表明该工具有效,并在最新的TensorFlow框架上发现了3个未知缺陷,这体现了我们指南的重要性。