To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and the proliferation of agentic AI, this version of the risk analysis technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R\&D, and self-replication. Specifically, we introduce more complex scenarios for cyber offense. For persuasion and manipulation, we evaluate the risk of LLM-to-LLM persuasion on newly released LLMs. For strategic deception and scheming, we add the new experiment with respect to emergent misalignment. For uncontrolled AI R\&D, we focus on the ``mis-evolution'' of agents as they autonomously expand their memory substrates and toolsets. Besides, we also monitor and evaluate the safety performance of OpenClaw during the interaction on the Moltbook. For self-replication, we introduce a new resource-constrained scenario. More importantly, we propose and validate a series of robust mitigation strategies to address these emerging threats, providing a preliminary technical and actionable pathway for the secure deployment of frontier AI. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
翻译:为理解和识别快速演进的人工智能(AI)模型所带来的前所未有的风险,《前沿人工智能风险管理框架实践》对其前沿风险进行了全面评估。随着大型语言模型(LLM)通用能力的快速演进以及智能体式AI的扩散,本版风险分析技术报告对五个关键维度进行了更新且更细致的评估:网络攻击、说服与操纵、战略性欺骗、不受控的AI研发以及自我复制。具体而言,我们为网络攻击引入了更复杂的场景。对于说服与操纵,我们评估了LLM对LLM的说服风险,并基于新发布的LLM进行测试。对于战略性欺骗与谋划,我们新增了关于涌现性错位的实验。对于不受控的AI研发,我们重点关注智能体在自主扩展其记忆载体和工具集时发生的“错误进化”。此外,我们还监测并评估了OpenClaw在Moltbook平台上交互期间的安全性能。对于自我复制,我们引入了一个新的资源受限场景。更重要的是,我们提出并验证了一系列稳健的缓解策略以应对这些新兴威胁,为前沿AI的安全部署提供了初步的技术与可执行路径。这项工作反映了我们当前对AI前沿风险的理解,并呼吁采取集体行动以缓解这些挑战。