Deep Learning (DL) compilers typically load a DL model and optimize it with intermediate representation.Existing DL compiler testing techniques mainly focus on model optimization stages, but rarely explore bug detection at the model loading stage. Effectively testing the model loading stage requires covering diverse usages of each DL operator from various DL libraries, which shares a common objective with DL library testing, indicating that the embedded knowledge in DL library tests is beneficial for testing the model loading stage of DL compilers. In this work, we propose OPERA to extract such domain knowledge from the test inputs for DL libraries. OPERA constructs diverse tests from the various test inputs for DL libraries (including the test inputs documented in DL libraries and those generated by recent fuzzers). In addition, it incorporates a diversity-based test prioritization strategy to migrate and execute those test inputs that are more likely to detect diverse bugs earlier. We considered three sources of tests in DL libraries for migration and used eight frontends from three DL compilers (e.g., TVM, TensorRT, and OpenVINO) for evaluation. OPERA detected 170 previously unknown bugs in total, 90 of which have been confirmed/fixed by developers, demonstrating the effectiveness of such the migration-based idea. The test prioritization strategy in OPERA improves testing efficiency with migrated tests by 11.9%~47.4% on average compared to general test prioritization strategies.
翻译:深度学习(DL)编译器通常加载一个DL模型并通过中间表示进行优化。现有的DL编译器测试技术主要关注模型优化阶段,但很少探索模型加载阶段的错误检测。有效测试模型加载阶段需要覆盖来自不同DL库的每个DL算子的多样化用法,这与DL库测试具有共同目标,表明DL库测试中嵌入的知识对测试DL编译器的模型加载阶段是有益的。在本工作中,我们提出OPERA,以从DL库的测试输入中提取此类领域知识。OPERA从DL库的各种测试输入(包括DL库文档记录的测试输入以及近期模糊测试工具生成的测试输入)构建多样化测试。此外,它结合了一种基于多样性的测试优先级排序策略,以迁移并执行那些更可能较早检测到多样化错误的测试输入。我们考虑了DL库中三个测试源进行迁移,并使用来自三个DL编译器(例如TVM、TensorRT和OpenVINO)的八个前端进行评估。OPERA总计检测到170个先前未知的错误,其中90个已被开发者确认/修复,证明了这种基于迁移的思路的有效性。与通用测试优先级排序策略相比,OPERA中的测试优先级排序策略将迁移测试的测试效率平均提高了11.9%至47.4%。