Real-world programs expecting structured inputs often has a format-parsing stage gating the deeper program space. Neither a mutation-based approach nor a generative approach can provide a solution that is effective and scalable. Large language models (LLM) pre-trained with an enormous amount of natural language corpus have proved to be effective for understanding the implicit format syntax and generating format-conforming inputs. In this paper, propose ChatFuzz, a greybox fuzzer augmented by generative AI. More specifically, we pick a seed in the fuzzer's seed pool and prompt ChatGPT generative models to variations, which are more likely to be format-conforming and thus of high quality. We conduct extensive experiments to explore the best practice for harvesting the power of generative LLM models. The experiment results show that our approach improves the edge coverage by 12.77\% over the SOTA greybox fuzzer (AFL++) on 12 target programs from three well-tested benchmarks. As for vulnerability detection, \sys is able to perform similar to or better than AFL++ for programs with explicit syntax rules but not for programs with non-trivial syntax.
翻译:现实世界中期望结构化输入的程序通常具有一个格式解析阶段,这会阻碍对更深层程序空间的探索。基于变异的方法和生成式方法均无法提供既有效又可扩展的解决方案。通过海量自然语言语料库预训练的大型语言模型(LLM)已被证明能够有效理解隐式格式语法并生成符合格式的输入。本文提出ChatFuzz——一种由生成式人工智能增强的灰盒模糊测试工具。具体而言,我们从模糊测试工具的种子池中选取一个种子,并提示ChatGPT生成式模型产生变异,这些变异更可能符合格式要求,因此具有高质量。我们进行了大量实验,以探索利用生成式LLM模型的最佳实践。实验结果表明,在三个经过充分测试的基准测试中的12个目标程序上,我们的方法相较于最先进的灰盒模糊测试工具(AFL++)将边覆盖率提升了12.77%。在漏洞检测方面,对于具有显式语法规则的程序,ChatFuzz的表现与AFL++相当或更优;但对于具有非平凡语法的程序,其性能则有限。