AI Loss of Control Incident Management: Response & Resilience

Recent research demonstrating AI systems exhibiting deception and shutdown resistance suggests that AI loss of control (LOC) is an urgent policy concern , yet current literature focuses almost exclusively on alignment and prevention. To address this gap, this paper introduces a foundational framework and taxonomy for managing catastrophic AI LOC incidents. The taxonomy's first level distinguishes between scenarios where regaining control is 'extremely costly' versus 'impossible'. While impossible scenarios demand immediate resilience investments to fundamentally restrict an AI's attack surface , extremely costly scenarios require active incident management via Containment and Threat Neutralization. The framework further categorizes these manageable events into accidental LOC (requiring automated circuit-breaker responses) and adversarial LOC (requiring graduated escalatory measures). By mapping three severity classes to specific scenario matrices, this paper provides a concrete, proportional guide for managing unprecedented AI risks.

翻译：近期研究表明，具备欺骗与抗关机能力的AI系统提示AI失控已成为紧迫的政策关切，然而现有文献几乎完全聚焦于对齐与预防。为弥补这一空白，本文提出了管理灾难性AI失控事件的基础框架与分类体系。该分类体系的第一层区分了"代价极其高昂"与"无法挽回"两种控制恢复场景。针对无法挽回场景需要立即进行韧性投资以根本性限制AI的攻击面，而代价极其高昂场景则要求通过遏制与威胁消除进行主动事件管理。该框架进一步将这些可控事件划分为意外失控（需启用自动化断路器响应）与对抗性失控（需采取渐进升级应对措施）。通过将三类严重等级映射至具体场景矩阵，本文为管理前所未有的AI风险提供了具体且成比例的指导方案。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

24+阅读 · 5月27日

AI 智能体系统：体系架构、应用场景及评估范式

专知会员服务

70+阅读 · 1月6日

中文万字长文《灾难场景中支持决策的人机协作模式综述》

专知会员服务

22+阅读 · 2025年9月20日

《人工智能治理实施的挑战与应对策略：系统性文献综述》最新97页

专知会员服务

25+阅读 · 2025年7月24日