FirmReBugger: A Benchmark Framework for Monolithic Firmware Fuzzers

Monolithic Firmware is widespread. Unsurprisingly, fuzz testing firmware is an active research field with new advances addressing the unique challenges in the domain. However, understanding and evaluating improvements by deriving metrics such as code coverage and unique crashes are problematic, leading to a desire for a reliable bug-based benchmark. To address the need, we design and build FirmReBugger, a holistic framework for fairly assessing monolithic firmware fuzzers with a realistic, diverse, bug-based benchmark. FirmReBugger proposes using bug oracles--C syntax expressions of bug descriptors--with an interpreter to automate analysis and accurately report on bugs discovered, discriminating between states of detected, triggered, reached and not reached. Importantly, our idea of benchmarking does not modify the target binary and simply replays fuzzing seeds to isolate the benchmark implementation from the fuzzer while providing a simple means to extend with new bug oracles. Further, analyzing fuzzing roadblocks, we created FirmBench, a set of diverse, real-world binary targets with 313 software bug oracles. Incorporating our analysis of roadblocks challenging monolithic firmware fuzzing, the bench provides for rapid evaluation of future advances. We implement FirmReBugger in a FuzzBench-for-Firmware type service and use FirmBench to evaluate 9 state-of-the art monolithic firmware fuzzers in the style of a reproducibility study, using a 10 CPU-year effort, to report our findings.

翻译：单体固件应用广泛。相应地，固件模糊测试已成为一个活跃的研究领域，新的进展不断应对该领域特有的挑战。然而，通过代码覆盖率和独特崩溃等指标来理解和评估改进效果存在困难，这催生了对基于可靠漏洞的基准测试的需求。为满足这一需求，我们设计并构建了FirmReBugger——一个基于真实、多样、漏洞驱动的基准，用于公平评估单体固件模糊测试器的整体框架。FirmReBugger提出使用漏洞预言机（即漏洞描述符的C语言语法表达式）配合解释器来自动化分析并精确报告发现的漏洞，区分“已检测”“已触发”“已抵达”和“未抵达”四种状态。重要的是，我们的基准测试方法无需修改目标二进制文件，仅通过重放模糊测试种子即可将基准实现与模糊测试器解耦，同时为扩展新的漏洞预言机提供了简便途径。此外，通过分析模糊测试的障碍，我们创建了FirmBench——一套包含313个软件漏洞预言机的多样化真实世界二进制目标集合。该基准融合了我们对单体固件模糊测试所面临障碍的分析，能够快速评估未来的技术进展。我们将FirmReBugger实现为类FuzzBench的固件测试服务，并采用FirmBench以可复现研究的方式评估了9种前沿单体固件模糊测试器，通过相当于10个CPU年的计算投入，最终报告了我们的研究发现。