Fuzzing MLIR Compilers with Custom Mutation Synthesis

Compiler technologies in deep learning and domain-specific hardware acceleration are increasingly adopting extensible compiler frameworks such as Multi-Level Intermediate Representation (MLIR) to facilitate more efficient development. With MLIR, compiler developers can easily define their own custom IRs in the form of MLIR dialects. However, the diversity and rapid evolution of such custom IRs make it impractical to manually write a custom test generator for each dialect. To address this problem, we design a new test generator called SYNTHFUZZ that combines grammar-based fuzzing with custom mutation synthesis. The key essence of SYNTHFUZZ is two fold: (1) It automatically infers parameterized context-dependent custom mutations from existing test cases. (2) It then concretizes the mutation's content depending on the target context and reduces the chance of inserting invalid edits by performing k-ancestor and pre(post)fix matching. SYNTHFUZZ obviates the need to manually define custom mutation operators for each dialect. We compare SYNTHFUZZ to three baselines: Grammarinator, MLIRSmith, and NeuRI. We conduct this comprehensive comparison on four different MLIR projects. Each project defines a new set of MLIR dialects where manually writing a custom test generator would take weeks of effort. Our evaluation shows that SYNTHFUZZ on average improves MLIR dialect pair coverage by 1.75 times, which increases branch coverage by 1.22 times. Further, we show that our context dependent custom mutation increases the proportion of valid tests by up to 1.11 times, indicating that SYNTHFUZZ correctly concretizes its parameterized mutations with respect to the target context. Parameterization of the mutations reduces the fraction of tests violating the base MLIR constraints by 0.57 times, increasing the time spent fuzzing dialect-specific code.

翻译：深度学习和领域专用硬件加速中的编译器技术正日益采用可扩展的编译器框架（如多级中间表示MLIR）以促进更高效的开发。借助MLIR，编译器开发者能够以MLIR方言的形式轻松定义自定义中间表示。然而，此类自定义中间表示的多样性和快速演进使得为每种方言手动编写定制化测试生成器变得不切实际。为解决该问题，我们设计了一种名为SYNTHFUZZ的新型测试生成器，它将基于语法的模糊测试与自定义变异合成相结合。SYNTHFUZZ的核心本质体现在两个方面：（1）它能从现有测试用例中自动推断出参数化的上下文相关自定义变异；（2）随后根据目标上下文具体化变异内容，并通过执行k祖先及前后缀匹配来降低插入无效编辑的概率。SYNTHFUZZ消除了为每种方言手动定义自定义变异操作符的需求。我们将SYNTHFUZZ与三种基线方法进行了比较：Grammarinator、MLIRSmith和NeuRI。我们在四个不同的MLIR项目上进行了全面对比，每个项目都定义了新的MLIR方言集合，而手动编写定制化测试生成器需要耗费数周工作量。评估结果表明，SYNTHFUZZ平均将MLIR方言对覆盖率提升了1.75倍，从而使分支覆盖率提高了1.22倍。此外，我们证明上下文相关的自定义变异将有效测试比例最高提升了1.11倍，这表明SYNTHFUZZ能针对目标上下文正确实例化其参数化变异。变异的参数化处理使违反基础MLIR约束的测试比例降低了0.57倍，从而增加了针对方言特定代码进行模糊测试的时间投入。