Automated vulnerability reproduction from CVE descriptions requires generating executable Proof-of-Concept (PoC) exploits and validating them in target environments. This process is critical in software security research and practice, yet remains time-consuming and demands specialized expertise when performed manually. While LLM agents show promise for automating this task, existing approaches often conflate exploring attack directions with fixing implementation details, which leads to unproductive debugging loops when reproduction fails. To address this, we propose Cve2PoC, an LLM-based dual-loop agent framework following a plan-execute-evaluate paradigm. The Strategic Planner analyzes vulnerability semantics and target code to produce structured attack plans. The Tactical Executor generates PoC code and validates it through progressive verification. The Adaptive Refiner evaluates execution results and routes failures to different loops: the \textit{Tactical Loop} for code-level refinement, while the \textit{Strategic Loop} for attack strategy replanning. This dual-loop design enables the framework to escape ineffective debugging by matching remediation to failure type. Evaluation on two benchmarks covering 617 real-world vulnerabilities demonstrates that Cve2PoC achieves 82.9\% and 54.3\% reproduction success rates on SecBench.js and PatchEval, respectively, outperforming the best baseline by 11.3\% and 20.4\%. Human evaluation confirms that generated PoCs achieve comparable code quality to human-written exploits in readability and reusability.
翻译:从CVE描述自动化生成漏洞复现方案,需要生成可执行的漏洞概念验证(PoC)利用代码并在目标环境中进行验证。这一过程在软件安全研究和实践中至关重要,但人工执行时仍耗时且需要专业知识。尽管基于大语言模型(LLM)的智能体在该任务自动化方面展现出潜力,现有方法常将攻击方向探索与实现细节修复混为一谈,导致复现失败时陷入低效的调试循环。为此,我们提出Cve2PoC——一个基于LLM的双循环智能体框架,遵循“规划-执行-评估”范式。战略规划器通过分析漏洞语义与目标代码生成结构化攻击方案;战术执行器生成PoC代码并通过渐进式验证进行测试;自适应优化器评估执行结果,并将失败案例分流至不同循环:\textit{战术循环}处理代码级优化,而\textit{战略循环}负责攻击策略重规划。这种双循环设计通过使修复方式与失败类型相匹配,使框架能够摆脱无效调试。在两个涵盖617个真实漏洞的基准测试中,Cve2PoC在SecBench.js和PatchEval上分别达到82.9%和54.3%的复现成功率,较最佳基线方法提升11.3%和20.4%。人工评估证实,生成的PoC代码在可读性与可复用性方面达到与人工编写利用代码相当的代码质量。