Testing with randomly generated inputs (fuzzing) has gained significant traction due to its capacity to expose program vulnerabilities automatically. Fuzz testing campaigns generate large amounts of data, making them ideal for the application of machine learning (ML). Neural program smoothing (NPS), a specific family of ML-guided fuzzers, aims to use a neural network as a smooth approximation of the program target for new test case generation. In this paper, we conduct the most extensive evaluation of NPS fuzzers against standard gray-box fuzzers (>11 CPU years and >5.5 GPU years), and make the following contributions: (1) We find that the original performance claims for NPS fuzzers do not hold; a gap we relate to fundamental, implementation, and experimental limitations of prior works. (2) We contribute the first in-depth analysis of the contribution of machine learning and gradient-based mutations in NPS. (3) We implement Neuzz++, which shows that addressing the practical limitations of NPS fuzzers improves performance, but that standard gray-box fuzzers almost always surpass NPS-based fuzzers. (4) As a consequence, we propose new guidelines targeted at benchmarking fuzzing based on machine learning, and present MLFuzz, a platform with GPU access for easy and reproducible evaluation of ML-based fuzzers. Neuzz++, MLFuzz, and all our data are public.
翻译:基于随机生成输入(模糊测试)因其自动化暴露程序漏洞的能力而受到广泛关注。模糊测试活动会产生大量数据,使其成为应用机器学习(ML)的理想场景。神经程序平滑(NPS)作为一类特定的ML引导型模糊测试工具,旨在利用神经网络作为程序目标的平滑近似函数来生成新测试用例。本文对NPS模糊测试工具与标准灰盒模糊测试工具进行了迄今最全面的评估(耗时超过11 CPU年和5.5 GPU年),并做出以下贡献:(1) 我们发现NPS模糊测试工具原有的性能主张不成立;这一差距与先前工作的基础性、实现性及实验性局限有关。(2) 我们首次深度分析了机器学习与梯度引导变异在NPS中的贡献作用。(3) 我们实现了Neuzz++,证明解决NPS模糊测试工具的实际局限能提升性能,但标准灰盒模糊测试工具几乎始终优于基于NPS的模糊测试工具。(4) 由此,我们提出了针对基于机器学习的模糊测试基准测试的新指南,并发布了MLFuzz平台(提供GPU访问),用于轻松且可复现地评估基于ML的模糊测试工具。Neuzz++、MLFuzz及所有相关数据均已开源。