As frontier AI models become more capable, evaluating their potential to enable cyberattacks is crucial for ensuring the safe development of Artificial General Intelligence (AGI). Current cyber evaluation efforts are often ad-hoc, lacking systematic analysis of attack phases and guidance on targeted defenses. This work introduces a novel evaluation framework that addresses these limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations and conduct AI-enabled adversary emulation for red teaming. Our approach adapts existing cyberattack chain frameworks for AI systems. We analyzed over 12,000 real-world instances of AI use in cyberattacks catalogued by Google's Threat Intelligence Group. Based on this analysis, we curated seven representative cyberattack chain archetypes and conducted a bottleneck analysis to pinpoint potential AI-driven cost disruptions. Our benchmark comprises 50 new challenges spanning various cyberattack phases. Using this benchmark, we devised targeted cybersecurity model evaluations, report on AI's potential to amplify offensive capabilities across specific attack phases, and offer recommendations for prioritizing defenses. We believe this represents the most comprehensive AI cyber risk evaluation framework published to date.
翻译:随着前沿人工智能模型的能力日益增强,评估其促成网络攻击的潜力对于确保通用人工智能(AGI)的安全发展至关重要。目前的网络评估工作往往是临时性的,缺乏对攻击阶段的系统分析以及对针对性防御的指导。本研究引入了一个新颖的评估框架,通过以下方式解决这些局限性:(1) 审视端到端的攻击链,(2) 识别人工智能威胁评估中的差距,以及 (3) 帮助防御者优先考虑针对性缓解措施,并执行人工智能驱动的对手模拟以进行红队测试。我们的方法调整了现有的网络攻击链框架以适用于人工智能系统。我们分析了谷歌威胁情报小组编目的超过12,000个网络攻击中人工智能使用的真实案例。基于此分析,我们整理了七个具有代表性的网络攻击链原型,并进行了瓶颈分析以确定潜在的人工智能驱动的成本颠覆点。我们的基准测试包含50个跨越不同网络攻击阶段的新挑战。利用此基准,我们设计了针对性的网络安全模型评估,报告了人工智能在特定攻击阶段放大攻击能力的潜力,并提供了关于优先防御的建议。我们相信这是迄今为止已发布的最全面的人工智能网络风险评估框架。