Large-scale web applications are widely deployed with complex third-party components, inheriting security risks arising from component vulnerabilities. Security assessment is therefore required to determine whether such known vulnerabilities remain practically exploitable in real applications. Penetration testing is a widely adopted approach that validates exploitability by launching concrete attacks against known vulnerabilities in real-world black-box systems. However, existing approaches often fail to automatically generate reliable exploits, limiting their effectiveness in practical security assessment. This limitation mainly stems from two issues: (1) precisely triggering vulnerabilities with correct technical details, and (2) adapting exploits to diverse real-world deployment settings. In this paper, we propose AutoEG, a fully automated multi-agent framework for exploit generation targeting black-box web applications. AutoEG has two phases: First, AutoEG extracts precise vulnerability trigger logic from unstructured vulnerability information and encapsulates it into reusable trigger functions. Second, AutoEG uses trigger functions for concrete attack objectives and iteratively refines exploits through feedback-driven interaction with the target application. We evaluate AutoEG on 104 real-world vulnerabilities with 29 attack objectives, resulting in 660 exploitation tasks and 55,440 exploit attempts. AutoEG achieves an average success rate of 82.41%, substantially outperforming state-of-the-art baselines, whose best performance reaches only 32.88%.
翻译:摘要:大规模Web应用广泛部署了复杂的第三方组件,从而继承了由组件漏洞引发的安全风险。因此,需要进行安全评估以确定这些已知漏洞在实际应用中是否仍具可利用性。渗透测试是一种广泛采用的方法,通过针对真实世界黑盒系统中的已知漏洞发起具体攻击来验证可利用性。然而,现有方法往往无法自动生成可靠的利用程序,从而限制了其在实际安全评估中的有效性。这一局限性主要源于两个问题:(1)使用正确的技术细节精确触发漏洞;(2)使利用程序适应多样化的真实部署环境。在本文中,我们提出了AutoEG,一个面向黑盒Web应用的、全自动的多智能体利用生成框架。AutoEG包含两个阶段:首先,AutoEG从非结构化漏洞信息中提取精确的漏洞触发逻辑,并将其封装为可复用的触发函数。其次,AutoEG利用触发函数实现具体的攻击目标,并通过与目标应用的反馈驱动交互迭代优化利用程序。我们在104个真实漏洞和29个攻击目标上对AutoEG进行了评估,生成了660个利用任务和55,440次利用尝试。AutoEG的平均成功率达到82.41%,显著优于性能最佳仅达32.88%的现有基线方法。