Directed greybox fuzzing (DGF) aims to efficiently trigger bugs at specific target locations by prioritizing seeds whose execution paths are more likely to reach the targets. However, existing DGF approaches suffer from imprecise potential estimation due to their reliance on static-analysis-based distance metrics. The over-approximation inherent in static analysis causes many seeds with execution paths irrelevant to vulnerability triggering to be mistakenly prioritized, significantly reducing fuzzing efficiency. To address this issue, we propose trace-guided directed greybox fuzzing (TDGF). TDGF replaces static-analysis-based distance metrics with vulnerability-oriented execution information (referred to as guidance traces) to steer directed fuzzing: seeds whose execution paths overlap more with the guidance traces are scheduled earlier for mutation. We empirically study two representative types of guidance traces: the control-flow trace and the call-stack trace of vulnerability-triggering executions. We find that the fine-grained control-flow traces offer nearly the same guidance capability as the coarse-grained call-stack traces, while call-stack traces are also easier for large language models (LLMs) to predict. Based on this insight, we further propose a framework that leverages LLMs to predict the call stack at vulnerability-triggering time and uses it to guide DGF. We implement our approach and evaluate it against several state-of-the-art fuzzers with experiments totaling 58.4 CPU-years. On a suite of real-world programs, our approach triggers vulnerabilities 2.13$\times$ to 3.14$\times$ faster than the baselines. Moreover, through directed patch testing on the latest program versions used in our controlled experiments, our approach discovers 10 new vulnerabilities and 2 incomplete fixes, with 10 assigned CVE IDs.
翻译:定向灰盒模糊测试旨在通过优先执行那些执行路径更可能到达特定目标位置的种子,来高效触发目标处的漏洞。然而,现有的定向灰盒模糊测试方法由于依赖基于静态分析的距离度量,存在潜在估计不精确的问题。静态分析固有的过度近似导致许多执行路径与漏洞触发无关的种子被错误地优先处理,显著降低了模糊测试的效率。为解决此问题,我们提出了轨迹导向的定向灰盒模糊测试。该方法用面向漏洞的执行信息(称为引导轨迹)替代基于静态分析的距离度量,以引导定向模糊测试:执行路径与引导轨迹重叠更多的种子被更早调度进行变异。我们实证研究了两种代表性的引导轨迹类型:漏洞触发执行的控制流轨迹和调用栈轨迹。我们发现细粒度的控制流轨迹提供的引导能力与粗粒度的调用栈轨迹几乎相同,而调用栈轨迹也更容易被大语言模型预测。基于这一洞见,我们进一步提出了一个框架,该框架利用大语言模型预测漏洞触发时的调用栈,并用其引导定向灰盒模糊测试。我们实现了该方法,并在总计58.4 CPU年的实验中与多个最先进的模糊测试工具进行了对比评估。在一系列真实世界程序上,我们的方法触发漏洞的速度比基线方法快2.13倍至3.14倍。此外,通过在受控实验中使用的最新程序版本上进行定向补丁测试,我们的方法发现了10个新漏洞和2个不完整的修复,其中10个已分配CVE编号。