ACETest: Automated Constraint Extraction for Testing Deep Learning Operators

Deep learning (DL) applications are prevalent nowadays as they can help with multiple tasks. DL libraries are essential for building DL applications. Furthermore, DL operators are the important building blocks of the DL libraries, that compute the multi-dimensional data (tensors). Therefore, bugs in DL operators can have great impacts. Testing is a practical approach for detecting bugs in DL operators. In order to test DL operators effectively, it is essential that the test cases pass the input validity check and are able to reach the core function logic of the operators. Hence, extracting the input validation constraints is required for generating high-quality test cases. Existing techniques rely on either human effort or documentation of DL library APIs to extract the constraints. They cannot extract complex constraints and the extracted constraints may differ from the actual code implementation. To address the challenge, we propose ACETest, a technique to automatically extract input validation constraints from the code to build valid yet diverse test cases which can effectively unveil bugs in the core function logic of DL operators. For this purpose, ACETest can automatically identify the input validation code in DL operators, extract the related constraints and generate test cases according to the constraints. The experimental results on popular DL libraries, TensorFlow and PyTorch, demonstrate that ACETest can extract constraints with higher quality than state-of-the-art (SOTA) techniques. Moreover, ACETest is capable of extracting 96.4% more constraints and detecting 1.95 to 55 times more bugs than SOTA techniques. In total, we have used ACETest to detect 108 previously unknown bugs on TensorFlow and PyTorch, with 87 of them confirmed by the developers. Lastly, five of the bugs were assigned with CVE IDs due to their security impacts.

翻译：深度学习应用如今广泛普及，因其能辅助完成多项任务。深度学习库是构建深度学习应用的基础，而深度学习算子作为库的重要组成部分，负责计算多维数据（张量）。因此，算子中的缺陷可能产生重大影响。测试是检测深度学习算子缺陷的有效手段。为高效测试算子，测试用例需通过输入有效性检查并到达算子的核心功能逻辑。因此，提取输入验证约束是生成高质量测试用例的关键。现有技术依赖人工或深度学习库API文档来提取约束，无法提取复杂约束，且所提取约束可能与实际代码实现存在差异。针对这一挑战，我们提出ACETest技术，通过从代码中自动提取输入验证约束，构建有效且多样化的测试用例，从而有效揭示深度学习算子核心功能逻辑中的缺陷。为此，ACETest可自动识别算子中的输入验证代码，提取相关约束，并依据这些约束生成测试用例。在主流深度学习库TensorFlow和PyTorch上的实验结果表明，ACETest提取的约束质量优于当前最优技术。此外，ACETest比最优技术多提取96.4%的约束，并检测出1.95至55倍的更多缺陷。总计，我们利用ACETest在TensorFlow和PyTorch上检测出108个未知缺陷，其中87个已获开发者确认。最后，5个缺陷因其安全影响被分配了CVE编号。