Automated vulnerability reproduction from CVE descriptions requires generating executable Proof-of-Concept (PoC) exploits and validating them in target environments. This process is critical in software security research and practice, yet remains time-consuming and demands specialized expertise when performed manually. While LLM agents show promise for automating this task, existing approaches often conflate exploring attack directions with fixing implementation details, which leads to unproductive debugging loops when reproduction fails. To address this, we propose CVE2PoC, an LLM-based dual-loop agent framework following a plan-execute-evaluate paradigm. The Strategic Planner analyzes vulnerability semantics and target code to produce structured attack plans. The Tactical Executor generates PoC code and validates it through progressive verification. The Adaptive Refiner evaluates execution results and routes failures to different loops: the Tactical Loop for code-level refinement, while the Strategic Loop for attack strategy replanning. This dual-loop design enables the framework to escape ineffective debugging by matching remediation to failure type. Evaluation on two benchmarks covering 617 real-world vulnerabilities demonstrates that CVE2PoC achieves 82.9% and 54.3% reproduction success rates on SecBench.js and PatchEval, respectively, outperforming the best baseline by 11.3% and 20.4%. Human evaluation confirms that generated PoCs achieve comparable code quality to human-written exploits in readability and reusability.
翻译:基于CVE描述实现自动化漏洞复现需要生成可执行的漏洞验证概念(PoC)利用程序,并在目标环境中进行验证。该过程在软件安全研究与实践领域至关重要,但人工执行时仍存在耗时且需专业知识的问题。尽管基于大语言模型的智能体在该任务的自动化方面展现出潜力,现有方法常将攻击方向探索与实现细节修复相混淆,导致复现失败时陷入低效的调试循环。为此,我们提出CVE2PoC——一个遵循“规划-执行-评估”范式的双循环大语言模型智能体框架。战略规划器通过分析漏洞语义与目标代码生成结构化攻击方案;战术执行器生成PoC代码并通过渐进式验证进行测试;自适应优化器评估执行结果并将失败案例路由至不同循环:战术循环负责代码级优化,战略循环则进行攻击策略重规划。这种双循环设计使框架能够通过匹配修复措施与失败类型来规避无效调试。在两个涵盖617个真实漏洞的基准测试中,CVE2PoC在SecBench.js和PatchEval上分别达到82.9%和54.3%的复现成功率,较最佳基线方法提升11.3%和20.4%。人工评估证实,生成的PoC在可读性与可复用性方面达到了与人工编写利用程序相当的代码质量。