Following the AI Seoul Summit in 2024, twelve AI companies published frontier AI safety frameworks (Frameworks) outlining their approaches to managing catastrophic risks from advanced AI systems. Emerging legislation increasingly treats these Frameworks as external accountability mechanisms, incorporating them into reporting requirements. But what do the Frameworks actually commit each company to do? This study assesses 12 Frameworks, using 65 weighted criteria, across four dimensions: risk identification, risk analysis & evaluation, risk treatment, and risk governance. Our criteria adapt established risk management principles from other high-risk industries (e.g. aviation, nuclear power) to the frontier AI context, following Campos et al. (2025). Overall scores range from 34% (Anthropic) to 8% (Cohere), with a median of 18%. Many aspects are missing or under-specified. These low scores may be natural given the nascency of AI risk management compared to industries with decades of practice. The current Frameworks are limited as accountability functions, with vague commitments that make it difficult to predict company decisions, assess whether planned responses are adequate, or determine whether commitments have been kept. Higher scores appear feasible within current constraints: a company adopting all leading practices currently adopted across their peers would score 51%, almost triple the median.
翻译:继2024年首尔AI峰会之后,十二家AI公司发布了前沿AI安全框架(简称“框架”),概述了它们管理先进AI系统带来的灾难性风险的方法。新兴立法日益将这些框架视为外部问责机制,并将其纳入报告要求。但这些框架实际上承诺各公司要做什么?本研究评估了12个框架,使用65项加权标准,涵盖四个维度:风险识别、风险分析与评估、风险处理以及风险治理。我们的标准借鉴了其他高风险行业(如航空、核电)已建立的风险管理原则,并依据Campos等人(2025)的工作将其调整至前沿AI领域。总体得分范围从34%(Anthropic)到8%(Cohere),中位数为18%。许多方面存在缺失或表述不充分。考虑到AI风险管理相较于拥有数十年实践经验的行业仍处于起步阶段,这些低分或许在情理之中。当前框架作为问责手段存在局限,其模糊的承诺使得难以预测公司决策、评估计划应对措施是否充分,或判断承诺是否已兑现。在现有约束下,更高分数似乎可行:采用同行所有领先实践的公司将获得51%的分数,几乎是中位数的三倍。