To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack

For over a decade, cybersecurity has relied on human labor scarcity to limit attackers to high-value targets manually or generic automated attacks at scale. Building sophisticated exploits requires deep expertise and manual effort, leading defenders to assume adversaries cannot afford tailored attacks at scale. AI agents break this balance by automating vulnerability discovery and exploitation across thousands of targets, needing only small success rates to remain profitable. Current developers focus on preventing misuse through data filtering, safety alignment, and output guardrails. Such protections fail against adversaries who control open-weight models, bypass safety controls, or develop offensive capabilities independently. We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy. In this position paper, we identify why existing defenses cannot stop adaptive adversaries and demonstrate that defenders must develop offensive security intelligence. We propose three actions for building frontier offensive AI capabilities responsibly. First, construct comprehensive benchmarks covering the full attack lifecycle. Second, advance from workflow-based to trained agents for discovering in-wild vulnerabilities at scale. Third, implement governance restricting offensive agents to audited cyber ranges, staging release by capability tier, and distilling findings into safe defensive-only agents. We strongly recommend treating offensive AI capabilities as essential defensive infrastructure, as containing cybersecurity risks requires mastering them in controlled settings before adversaries do.

翻译：十余年来，网络安全领域一直依赖人力资源的稀缺性，将攻击者限制于针对高价值目标的手动攻击或大规模通用自动化攻击。构建复杂的漏洞利用需要深厚的专业知识与人工投入，这使得防御者通常假设对手无法承担大规模定制化攻击的成本。AI智能体通过自动化漏洞发现与利用流程，可同时针对数千个目标实施攻击，从而打破了这一平衡——即使成功率很低仍能保持盈利。当前开发者主要通过数据过滤、安全对齐和输出防护机制来防止技术滥用。然而，当攻击者掌控开源模型权重、绕过安全控制或独立开发攻击能力时，此类防护措施便会失效。我们认为，由AI智能体驱动的网络攻击已不可避免，这要求防御策略发生根本性转变。在本立场论文中，我们阐述了现有防御手段为何无法阻止自适应攻击者，并论证了防御方必须发展攻击性安全情报能力。我们提出了三项负责任构建前沿攻击性AI能力的行动建议：第一，建立覆盖完整攻击生命周期的综合基准测试体系；第二，推动从基于工作流的智能体向经过训练的智能体演进，以实现大规模野外漏洞发现；第三，建立治理框架，将攻击性智能体限制于经审计的网络靶场中运行，按能力层级分阶段发布，并将研究成果提炼为仅具备防御功能的安全智能体。我们强烈建议将攻击性AI能力视为关键防御基础设施，因为要控制网络安全风险，就必须在对手掌握之前，于受控环境中率先精通这些能力。