Artificial intelligence (AI) is increasingly being used to augment and automate cyber operations, altering the scale, speed, and accessibility of malicious activity. These shifts raise urgent questions about when AI systems introduce unacceptable or intolerable cyber risk, and how risk thresholds should be identified before harms materialize at scale. In recent years, industry, government, and civil society actors have begun to articulate such thresholds for advanced AI systems, with the goal of signaling when models meaningfully amplify cyber threats, for example, by automating multi-stage intrusions, enabling zero-day discovery, or lowering the expertise required for sophisticated attacks. However, current approaches to determine these thresholds remain fragmented and limited. Many thresholds rely solely on capability benchmarks or narrow threat scenarios, and are weakly connected to empirical evidence. This paper proposes a structured approach to developing and evaluating AI cyber risk thresholds that is probabilistic, evidence-based, and operationalizable. In this paper we make three core contributions that build on our prior work that highlights the limitations of relying solely on capability assessments. First, we analyze existing industry cyber thresholds and identify common threshold elements as well as recurring methodological shortcomings. Second, we propose the use of Bayesian networks as a tool for modeling AI-enabled cyber risk, enabling the integration of heterogeneous evidence, explicit representation of uncertainty, and continuous updating as new information emerges. Third, we illustrate this approach through a focused case study on AI-augmented phishing, demonstrating how qualitative threat insights can be decomposed into measurable variables and recombined into structured risk estimates that better capture how AI changes attacker behavior and outcomes.
翻译:人工智能正日益被用于增强和自动化网络攻防行动,从而改变恶意活动的规模、速度和可及性。这些转变引发了一系列紧迫问题:AI系统何时会引入不可接受或难以容忍的网络风险?又应如何在危害大规模显现前识别风险阈值?近年来,工业界、政府及民间社会行为体已开始为先进AI系统阐明此类阈值,旨在标示模型何时实质性地放大网络威胁——例如通过自动化多阶段入侵、实现零日漏洞发现或降低实施复杂攻击所需的技术门槛。然而,当前确定这些阈值的方法仍存在碎片化与局限性:多数阈值仅依赖能力基准测试或狭窄的威胁场景,且与实证证据关联薄弱。本文提出一种结构化方法来构建和评估AI网络风险阈值,该方法具有概率化、证据驱动和可操作化的特点。我们在先前研究(指出单纯依赖能力评估的局限性)基础上作出三项核心贡献:首先,系统分析现有行业网络风险阈值,归纳共性阈值要素及反复出现的方法论缺陷;其次,提出采用贝叶斯网络作为AI驱动网络风险建模工具,使其能够整合异构证据、显式表征不确定性,并在新信息出现时持续更新;最后,通过聚焦于AI增强型钓鱼攻击的案例研究,阐释如何将定性威胁洞察分解为可量化变量,并重组为结构化风险估计,从而更精准地捕捉AI如何改变攻击者行为与攻击结果。