AXE: An Agentic eXploit Engine for Confirming Zero-Day Vulnerability Reports

Vulnerability detection tools are widely adopted in software projects, yet they often overwhelm maintainers with false positives and non-actionable reports. Automated exploitation systems can help validate these reports; however, existing approaches typically operate in isolation from detection pipelines, failing to leverage readily available metadata such as vulnerability type and source-code location. In this paper, we investigate how reported security vulnerabilities can be assessed in a realistic grey-box exploitation setting that leverages minimal vulnerability metadata, specifically a CWE classification and a vulnerable code location. We introduce Agentic eXploit Engine (AXE), a multi-agent framework for Web application exploitation that maps lightweight detection metadata to concrete exploits through decoupled planning, code exploration, and dynamic execution feedback. Evaluated on the CVE-Bench dataset, AXE achieves a 30% exploitation success rate, a 3x improvement over state-of-the-art black-box baselines. Even in a single-agent configuration, grey-box metadata yields a 1.75x performance gain. Systematic error analysis shows that most failed attempts arise from specific reasoning gaps, including misinterpreted vulnerability semantics and unmet execution preconditions. For successful exploits, AXE produces actionable, reproducible proof-of-concept artifacts, demonstrating its utility in streamlining Web vulnerability triage and remediation. We further evaluate AXE's generalizability through a case study on a recent real-world vulnerability not included in CVE-Bench.

翻译：漏洞检测工具在软件项目中已被广泛采用，但其产生的误报和不可操作报告常常使维护者不堪重负。自动化利用系统有助于验证这些报告；然而，现有方法通常独立于检测流水线运行，未能利用漏洞类型和源代码位置等易于获取的元数据。本文研究了如何在利用最少漏洞元数据（具体为CWE分类和漏洞代码位置）的现实灰盒利用场景中评估已报告的安全漏洞。我们提出了智能利用引擎（Agentic eXploit Engine， AXE），这是一个用于Web应用程序利用的多智能体框架，它通过解耦的规划、代码探索和动态执行反馈，将轻量级检测元数据映射到具体的利用方案。在CVE-Bench数据集上的评估表明，AXE实现了30%的利用成功率，相比最先进的基线方法提升了3倍。即使在单智能体配置下，灰盒元数据也能带来1.75倍的性能提升。系统性的错误分析表明，大多数失败尝试源于特定的推理缺陷，包括对漏洞语义的误解以及未满足的执行前提条件。对于成功的利用，AXE能生成可操作、可复现的概念验证工件，证明了其在简化Web漏洞分类与修复流程中的实用性。我们通过一项针对CVE-Bench中未包含的近期真实世界漏洞的案例研究，进一步评估了AXE的泛化能力。

相关内容

元数据

关注 7

元数据（Metadata），又称元数据、中介数据、中继数据[来源请求]，为描述数据的数据（data about data），主要是描述数据属性（property）的信息，用来支持如指示存储位置、历史数据、资源查找、文件纪录等功能。元数据算是一种电子式目录，为了达到编制目录的目的，必须在描述并收藏数据的内容或特色，进而达成协助数据检索的目的。

《利用视觉问题解答进行异常检测》美陆军实验室报告

专知会员服务

23+阅读 · 2024年5月21日

弹药异常检测《使用机器学习进行缺陷表征》最佳论文，MODSIM World 2023

专知会员服务

36+阅读 · 2023年7月22日

【2023新书】人工智能在网络安全中的应用，215页pdf

专知会员服务

104+阅读 · 2023年5月5日

《不使用辅助（ANCILLA）量子位的量子错误检测》2022最新171页博士论文，美国空军技术学院

专知会员服务

13+阅读 · 2022年10月24日