High-quality datasets of real-world vulnerabilities and their corresponding verifiable exploits are crucial resources in software security research. Yet such resources remain scarce, as their creation demands intensive manual effort and deep security expertise. In this paper, we present CVE-GENIE, an automated, large language model (LLM)-based multi-agent framework designed to reproduce real-world vulnerabilities, provided in Common Vulnerabilities and Exposures (CVE) format, to enable creation of high-quality vulnerability datasets. Given a CVE entry as input, CVE-GENIE gathers the relevant resources of the CVE, automatically reconstructs the vulnerable environment, and (re)produces a verifiable exploit. Our systematic evaluation highlights the efficiency and robustness of CVE-GENIE's design and successfully reproduces approximately 51% (428 of 841) CVEs published in 2024-2025, complete with their verifiable exploits, at an average cost of $2.77 per CVE. Our pipeline offers a robust method to generate reproducible CVE benchmarks, valuable for diverse applications such as fuzzer evaluation, vulnerability patching, and assessing AI's security capabilities.
翻译:高质量的真实世界漏洞及其对应可验证漏洞利用的数据集是软件安全研究的关键资源。然而,此类资源仍然稀缺,因为其创建需要密集的人工投入和深厚的安全专业知识。本文提出CVE-GENIE,一个基于大语言模型(LLM)的自动化多智能体框架,旨在复现以通用漏洞披露(CVE)格式提供的真实世界漏洞,从而支持高质量漏洞数据集的创建。给定一个CVE条目作为输入,CVE-GENIE会收集该CVE的相关资源,自动重建漏洞环境,并(重新)生成可验证的漏洞利用。我们的系统评估突显了CVE-GENIE设计的效率与鲁棒性,成功复现了2024-2025年间发布的约51%(841个中的428个)CVE,并附带其可验证漏洞利用,每个CVE的平均成本为2.77美元。我们的流程提供了一种生成可复现CVE基准的稳健方法,对于模糊测试评估、漏洞修复以及评估AI安全能力等多种应用具有重要价值。