Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

Justin W. Lin,Eliot Krzysztof Jones,Donovan Julian Jasper,Ethan Jun-shen Ho,Anna Wu,Arnold Tianyi Yang,Neil Perry,Andy Zou,Matt Fredrikson,J. Zico Kolter,Percy Liang,Dan Boneh,Daniel E. Ho

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents, and automatic vulnerability triaging. In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants. While existing scaffolds such as Codex and CyAgent underperformed relative to most human participants, ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants. We observe that AI agents offer advantages in systematic enumeration, parallel exploitation, and cost -- certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers. We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.

翻译：我们首次在真实企业环境中对AI智能体与人类网络安全专业人员进行了全面评估。我们在一个包含12个子网、约8000台主机的大型大学网络环境中，评估了10名网络安全专业人员、6个现有AI智能体以及我们新开发的智能体框架ARTEMIS。ARTEMIS是一个多智能体框架，具备动态提示生成、任意子智能体调用和自动漏洞分级功能。在我们的对比研究中，ARTEMIS综合排名第二，发现了9个有效漏洞，提交有效率达82%，表现优于10名人类参与者中的9位。虽然现有框架如Codex和CyAgent的表现低于大多数人类参与者，但ARTEMIS展现出的技术复杂性和提交质量与最优秀的人类参与者相当。我们观察到AI智能体在系统化枚举、并行漏洞利用和成本方面具有优势——某些ARTEMIS变体的运行成本为每小时18美元，而专业渗透测试人员的成本为每小时60美元。同时我们也发现了关键的能力差距：AI智能体存在较高的误报率，且在基于图形界面的任务处理方面存在困难。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《人工智能在网络防御中的机遇》

专知会员服务

12+阅读 · 6月8日

保护网络物理系统中的 AI 智能体：关于环境交互、深度伪造威胁及其防御技术的综述

专知会员服务

10+阅读 · 2月15日

智能体化 AI 与网络安全综述：挑战、机遇与用例原型

专知会员服务

30+阅读 · 1月13日

AI 智能体系统：体系架构、应用场景及评估范式

专知会员服务

70+阅读 · 1月6日