Modern compilers, such as LLVM, are complex pieces of software. Due to their complexity, manual testing is unlikely to suffice, yet formal verification is difficult to scale. End-to-end fuzzing can be used, but it has difficulties in achieving high coverage of some components of LLVM. In this paper, we implement IRFuzzer to investigate the effectiveness of specialized fuzzing of the LLVM compiler backend. We focus on two approaches to improve the fuzzer: guaranteed input validity using constrained mutations and improved feedback quality. The mutator in IRFuzzer is capable of generating a wide range of LLVM IR inputs, including structured control flow, vector types, and function definitions. The system instruments coding patterns in the compiler to monitor the execution status of instruction selection. The instrumentation not only provides a new coverage feedback called matcher table coverage, but also provides an architecture specific guidance to the mutator. We show that IRFuzzer is more effective than existing fuzzers by fuzzing on 29 mature LLVM backend targets. In the process, we reported 74 confirmed new bugs in LLVM upstream, out of which 49 have been fixed, five have been back ported to LLVM 15, showing that specialized fuzzing provides useful and actionable insights to LLVM developers.
翻译:现代编译器(如LLVM)是复杂的软件。由于其复杂性,手动测试难以满足需求,而形式化验证又难以扩展。端到端模糊测试虽可应用,但在覆盖LLVM某些组件方面存在困难。本文实现IRFuzzer以研究针对LLVM编译器后端进行专业化模糊测试的有效性。我们聚焦于两种改进模糊测试器的方法:通过约束性变异保证输入有效性,以及改进反馈质量。IRFuzzer中的变异器能够生成广泛类型的LLVM IR输入,包括结构化控制流、向量类型和函数定义。该系统对编译器中的编码模式进行插桩,以监视指令选择的执行状态。该插桩不仅提供一种称为匹配表覆盖的新覆盖反馈,还为变异器提供架构特定的引导。通过针对29个成熟的LLVM后端目标进行模糊测试,我们证明IRFuzzer比现有模糊测试器更有效。在此过程中,我们在LLVM上游报告了74个确认的新漏洞,其中49个已被修复,5个被反向移植至LLVM 15,表明专业化模糊测试为LLVM开发者提供了有用且可操作的见解。