Greybox fuzzing is one of the most popular methods for detecting software vulnerabilities, which conducts a biased random search within the program input space. To enhance its effectiveness in achieving deep coverage of program behaviors, greybox fuzzing is often combined with concolic execution, which performs a path-sensitive search over the domain of program inputs. In hybrid fuzzing, conventional greybox fuzzing is followed by concolic execution in an iterative loop, where reachability roadblocks encountered by greybox fuzzing are tackled by concolic execution. However, such hybrid fuzzing still suffers from difficulties conventionally faced by symbolic execution, such as the need for environment modeling and system call support. In this work, we show how to achieve the effect of concolic execution without having to compute and solve symbolic path constraints. When coverage-based greybox fuzzing reaches a roadblock in terms of reaching certain branches, we conduct a slicing on the execution trace and suggest modifications of the input to reach the relevant branches. A Large Language Model (LLM) is used as a solver to generate the modified input for reaching the desired branches. Compared with both the vanilla greybox fuzzer AFL and hybrid fuzzers Intriguer and Qsym, our LLM-based hybrid fuzzer HyLLfuzz (pronounced "hill fuzz") demonstrates superior coverage. Furthermore, the LLM-based concolic execution in HyLLfuzz takes a time that is 4-19 times faster than the concolic execution running in existing hybrid fuzzing tools. This experience shows that LLMs can be effectively inserted into the iterative loop of hybrid fuzzers, to efficiently expose more program behaviors.
翻译:灰盒模糊测试是检测软件漏洞最流行的方法之一,它在程序输入空间内执行有偏随机搜索。为了提升其在深度覆盖程序行为方面的有效性,灰盒模糊测试常与具体执行相结合,后者在程序输入域上执行路径敏感搜索。在混合模糊测试中,传统的灰盒模糊测试与具体执行在迭代循环中交替进行,其中灰盒模糊测试遇到的可达性障碍由具体执行处理。然而,此类混合模糊测试仍面临符号执行传统上遇到的困难,例如需要环境建模和系统调用支持。在本工作中,我们展示了如何在不计算和求解符号路径约束的情况下实现具体执行的效果。当基于覆盖率的灰盒模糊测试在到达特定分支方面遇到障碍时,我们对执行轨迹进行切片,并提出修改输入以到达相关分支的建议。使用大型语言模型作为求解器来生成到达目标分支的修改输入。与原始灰盒模糊测试工具AFL以及混合模糊测试工具Intriguer和Qsym相比,我们基于LLM的混合模糊测试工具HyLLfuzz(发音为"hill fuzz")展现出更优的覆盖率。此外,HyLLfuzz中基于LLM的具体执行耗时比现有混合模糊测试工具中的具体执行快4至19倍。这一实践表明,LLM可有效嵌入混合模糊测试工具的迭代循环中,以高效暴露更多程序行为。