Deep Learning (DL) is prevalently used in various industries to improve decision-making and automate processes, driven by the ever-evolving DL libraries and compilers. The correctness of DL systems is crucial for trust in DL applications. As such, the recent wave of research has been studying the automated synthesis of test-cases (i.e., DNN models and their inputs) for fuzzing DL systems. However, existing model generators only subsume a limited number of operators, for lacking the ability to pervasively model operator constraints. To address this challenge, we propose NeuRI, a fully automated approach for generating valid and diverse DL models composed of hundreds of types of operators. NeuRI adopts a three-step process: (i) collecting valid and invalid API traces from various sources; (ii) applying inductive program synthesis over the traces to infer the constraints for constructing valid models; and (iii) performing hybrid model generation by incorporating both symbolic and concrete operators concolically. Our evaluation shows that NeuRI improves branch coverage of TensorFlow and PyTorch by 51% and 15% over the state-of-the-art. Within four months, NeuRI finds 87 new bugs for PyTorch and TensorFlow, with 64 already fixed or confirmed, and 8 high-priority bugs labeled by PyTorch, constituting 10% of all high-priority bugs of the period. Additionally, open-source developers regard error-inducing models reported by us as "high-quality" and "common in practice".
翻译:摘要:深度学习(DL)广泛应用于各行业以改进决策制定与流程自动化,这得益于不断演进的DL库与编译器。DL系统的正确性对于信任DL应用至关重要。因此,近期研究浪潮聚焦于自动化合成测试用例(即DNN模型及其输入)以对DL系统进行模糊测试。然而,现有模型生成器仅涵盖有限数量的算子,因其缺乏对算子约束的全面建模能力。为应对这一挑战,我们提出NeuRI——一种全自动方法,用于生成由数百种算子类型组成的有效且多样化的DL模型。NeuRI采用三步流程:(i)从多种来源收集有效与无效的API踪迹;(ii)对踪迹应用归纳程序合成以推断构建有效模型的约束条件;(iii)通过符号与具体算子的合取执行,实现混合模型生成。评估表明,相较于现有最先进技术,NeuRI将TensorFlow与PyTorch的分支覆盖率分别提升51%和15%。在四个月内,NeuRI为PyTorch和TensorFlow发现87个新错误,其中64个已被修复或确认,8个由PyTorch标记为高优先级错误,占同期所有高优先级错误的10%。此外,开源开发者将我们报告的错误诱导模型评价为"高质量"且"实践中常见"。