ACETest: Automated Constraint Extraction for Testing Deep Learning Operators

Deep learning (DL) applications are prevalent nowadays as they can help with multiple tasks. DL libraries are essential for building DL applications. Furthermore, DL operators are the important building blocks of the DL libraries, that compute the multi-dimensional data (tensors). Therefore, bugs in DL operators can have great impacts. Testing is a practical approach for detecting bugs in DL operators. In order to test DL operators effectively, it is essential that the test cases pass the input validity check and are able to reach the core function logic of the operators. Hence, extracting the input validation constraints is required for generating high-quality test cases. Existing techniques rely on either human effort or documentation of DL library APIs to extract the constraints. They cannot extract complex constraints and the extracted constraints may differ from the actual code implementation. To address the challenge, we propose ACETest, a technique to automatically extract input validation constraints from the code to build valid yet diverse test cases which can effectively unveil bugs in the core function logic of DL operators. For this purpose, ACETest can automatically identify the input validation code in DL operators, extract the related constraints and generate test cases according to the constraints. The experimental results on popular DL libraries, TensorFlow and PyTorch, demonstrate that ACETest can extract constraints with higher quality than state-of-the-art (SOTA) techniques. Moreover, ACETest is capable of extracting 96.4% more constraints and detecting 1.95 to 55 times more bugs than SOTA techniques. In total, we have used ACETest to detect 108 previously unknown bugs on TensorFlow and PyTorch, with 87 of them confirmed by the developers. Lastly, five of the bugs were assigned with CVE IDs due to their security impacts.

翻译：深度学习（DL）应用如今已广泛普及，可协助处理多项任务。深度学习库是构建DL应用的核心基础，而深度学习算子作为DL库的重要组成部分，负责计算多维数据（张量）。因此，算子中的缺陷可能产生重大影响。测试是检测DL算子缺陷的有效手段。为高效测试DL算子，测试用例必须通过输入有效性检查，并能触及算子的核心功能逻辑。因此，提取输入验证约束是生成高质量测试用例的关键。现有技术依赖人工或DL库API文档提取约束，但无法提取复杂约束，且提取结果可能与实际代码实现存在偏差。为解决这一挑战，我们提出ACETest——一种从代码中自动提取输入验证约束的技术。该技术可生成有效且多样的测试用例，有效揭露DL算子核心功能逻辑中的缺陷。为此，ACETest能自动识别DL算子中的输入验证代码、提取相关约束，并依据约束生成测试用例。在主流DL库TensorFlow和PyTorch上的实验结果表明，ACETest提取的约束质量优于当前最先进（SOTA）技术。此外，ACETest可提取的约束数量较SOTA技术多96.4%，检测到的缺陷数量是SOTA技术的1.95至55倍。总计，我们利用ACETest在TensorFlow和PyTorch中检测到108个先前未知的缺陷，其中87个已获开发者确认。最后，由于其中5个缺陷存在安全影响，它们被分配了CVE编号。