With the application of deep learning technology, tools of DL framework testing are in high demand. Existing DL framework testing tools have limited coverage of bug types. For example, they lack the capability of effectively finding performance bugs, which are critical for DL models regarding performance, economics, and the environment. Moreover, existing tools are inefficient, generating hundreds of test cases with few trigger bugs. In this paper, we propose Citadel, a method that accelerates bug finding in terms of efficiency and effectiveness. We observe that many DL framework bugs are similar due to the similarity of operators and algorithms belonging to the same family. Orthogonal to existing bug-finding tools, Citadel aims to find new bugs that are similar to reported ones that have known test oracles. Citadel defines context similarity to measure the similarity of DL framework API pairs and automatically generates test cases with oracles for APIs that are similar to the problematic APIs in existing bug reports. Citadel effectively detects 58 and 66 API bugs on PyTorch and TensorFlow (excluding those rejected by developers or duplicates of prior reports), many of which, e.g., 13 performance bugs, cannot be detected by existing tools. Moreover, 35.40% of test cases generated by Citadel can trigger bugs significantly transcending the state-of-the-art method (3.90%).
翻译:随着深度学习技术的广泛应用,对深度学习框架测试工具的需求日益增长。现有的深度学习框架测试工具在缺陷类型覆盖方面存在局限。例如,它们缺乏有效发现性能缺陷的能力,而这类缺陷对于深度学习模型的性能、经济性和环境影响至关重要。此外,现有工具效率低下,往往生成数百个测试用例却仅能触发少量缺陷。本文提出Citadel方法,在效率和效果两方面加速缺陷发现过程。我们观察到,由于属于同一家族的运算符和算法具有相似性,许多深度学习框架缺陷表现出相似特征。与现有缺陷发现工具正交,Citadel旨在发现与已报告缺陷相似的新缺陷,这些已报告缺陷具有已知的测试预言。Citadel定义了上下文相似性来度量深度学习框架API对的相似程度,并自动为与现有缺陷报告中问题API相似的API生成带预言的测试用例。Citadel在PyTorch和TensorFlow上分别有效检测出58个和66个API缺陷(不包括被开发者拒绝或与先前报告重复的缺陷),其中许多缺陷(例如13个性能缺陷)无法被现有工具检测。此外,Citadel生成的测试用例中有35.40%能够触发缺陷,显著超越了当前最优方法(3.90%)的表现。