MLFuzz, a work accepted at ACM FSE 2023, revisits the performance of a machine learning-based fuzzer, NEUZZ. We demonstrate that its main conclusion is entirely wrong due to several fatal bugs in the implementation and wrong evaluation setups, including an initialization bug in persistent mode, a program crash, an error in training dataset collection, and a mistake in fuzzing result collection. Additionally, MLFuzz uses noisy training datasets without sufficient data cleaning and preprocessing, which contributes to a drastic performance drop in NEUZZ. We address these issues and provide a corrected implementation and evaluation setup, showing that NEUZZ consistently performs well over AFL on the FuzzBench dataset. Finally, we reflect on the evaluation methods used in MLFuzz and offer practical advice on fair and scientific fuzzing evaluations.
翻译:被ACM FSE 2023会议接收的MLFuzz工作,重新评估了基于机器学习的模糊测试工具NEUZZ的性能。我们证明,由于其实验实现中存在若干致命缺陷以及错误的评估设置,其核心结论是完全错误的。这些问题包括:持久模式下存在的初始化错误、程序崩溃、训练数据集收集错误以及模糊测试结果收集错误。此外,MLFuzz使用了未经充分数据清洗和预处理的噪声训练数据集,这导致了NEUZZ的性能急剧下降。我们解决了这些问题,提供了修正后的实现与评估设置,并证明NEUZZ在FuzzBench数据集上始终优于AFL。最后,我们对MLFuzz中使用的评估方法进行了反思,并就如何进行公平、科学的模糊测试评估提出了实用建议。