How to search for bugs in 1,000 programs using a pre-existing fuzzer and a standard PC? We consider this problem and show that a well-designed strategy that determines which programs to fuzz and for how long can greatly impact the number of bugs found across the programs. In fact, the impact of employing an effective strategy is comparable to that of utilizing a state-of-the-art fuzzer. The considered problem is referred to as fuzzing at scale, and the strategy as scheduler. We show that besides a naive scheduler, that allocates equal fuzz time to all programs, we can consider dynamic schedulers that adjust time allocation based on the ongoing fuzzing progress of individual programs. Such schedulers are superior because they lead both to higher number of total found bugs and to higher number of found bugs for most programs. The performance gap between naive and dynamic schedulers can be as wide (or even wider) as the gap between two fuzzers. Our findings thus suggest that the problem of advancing schedulers is fundamental for fuzzing at scale. We develop several schedulers and leverage the most sophisticated one to fuzz simultaneously our newly compiled benchmark of around 5,000 Ubuntu programs, and detect 4908 bugs.
翻译:如何利用现有模糊测试工具和标准个人计算机在1000个程序中搜索漏洞?我们研究该问题并证明:通过精心设计的策略来确定测试对象及测试时长,能显著提升跨程序漏洞发现数量。实际上,采用高效策略的效果堪比使用最先进的模糊测试工具。该研究问题被称为大规模模糊测试,其策略则称为调度器。研究表明,相较于为所有程序平均分配测试时间的朴素调度器,动态调度器能根据各程序实时测试进度调整时间分配,具有显著优势:既能提升总体漏洞发现量,又能使多数程序发现更多漏洞。朴素调度器与动态调度器的性能差距,可能达到(甚至超过)两种模糊测试工具间的差距。因此我们认为,推进调度器研究是大规模模糊测试的核心课题。我们开发了多种调度器,并运用最先进的调度器同时测试新构建的约5000个Ubuntu程序基准集,最终检测出4908个漏洞。