Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resulting in low testing efficiency. In this paper, we utilize carefully designed prompt engineering to drive the large language model (LLM) to predict high-risk option combinations (i.e., more likely to contain vulnerabilities) and perform fuzz testing automatically without human intervention. We developed a tool called ProphetFuzz and evaluated it on a dataset comprising 52 programs collected from three related studies. The entire experiment consumed 10.44 CPU years. ProphetFuzz successfully predicted 1748 high-risk option combinations at an average cost of only \$8.69 per program. Results show that after 72 hours of fuzzing, ProphetFuzz discovered 364 unique vulnerabilities associated with 12.30\% of the predicted high-risk option combinations, which was 32.85\% higher than that found by state-of-the-art in the same timeframe. Additionally, using ProphetFuzz, we conducted persistent fuzzing on the latest versions of these programs, uncovering 140 vulnerabilities, with 93 confirmed by developers and 21 awarded CVE numbers.
翻译:选项组合相关的漏洞因其庞大的搜索空间,对软件安全测试构成了重大挑战。先前的研究主要通过变异或过滤技术应对这一挑战,这些方法低效地将所有选项组合视为具有同等的漏洞潜力,从而在非漏洞目标上浪费大量时间,导致测试效率低下。本文利用精心设计的提示工程驱动大语言模型预测高风险选项组合(即更可能包含漏洞的组合),并在无需人工干预的情况下自动执行模糊测试。我们开发了一个名为ProphetFuzz的工具,并在一个包含从三项相关研究中收集的52个程序的数据集上对其进行了评估。整个实验消耗了10.44 CPU年。ProphetFuzz成功预测了1748个高风险选项组合,每个程序的平均成本仅为8.69美元。结果显示,经过72小时的模糊测试,ProphetFuzz发现了与12.30%的预测高风险选项组合相关的364个独特漏洞,这比同期最先进技术发现的漏洞数量高出32.85%。此外,利用ProphetFuzz,我们对这些程序的最新版本进行了持续模糊测试,发现了140个漏洞,其中93个已获开发者确认,21个被授予了CVE编号。