Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies

Miles Brundage,Noemi Dreksler,Aidan Homewood,Sean McGregor,Patricia Paskov,Conrad Stosz,Girish Sastry,A. Feder Cooper,George Balston,Steven Adler,Stephen Casper,Markus Anderljung,Grace Werner,Soren Mindermann,Vasilios Mavroudis,Ben Bucknall,Charlotte Stix,Jonas Freund,Lorenzo Pacchiardi,Jose Hernandez-Orallo,Matteo Pistillo,Michael Chen,Chris Painter,Dean W. Ball,Cullen O'Keefe,Gabriel Weil,Ben Harack,Graeme Finley,Ryan Hassan,Scott Emmons,Charles Foster,Anka Reuel,Bri Treece,Yoshua Bengio,Daniel Reti,Rishi Bommasani,Cristian Trout,Ali Shahin Shamsabadi,Rajiv Dattani,Adrian Weller,Robert Trager,Jaime Sevilla,Lauren Wagner,Lisa Soder,Ketan Ramakrishnan,Henry Papadatos,Malcolm Murray,Ryan Tovcimak

Frontier AI is becoming critical societal infrastructure, but outsiders lack reliable ways to judge whether leading developers' safety and security claims are accurate and whether their practices meet relevant standards. Compared to other social and technological systems we rely on daily such as consumer products, corporate financial statements, and food supply chains, AI is subject to less rigorous third-party scrutiny along several dimensions. Ambiguity about whether AI systems are trustworthy can discourage deployment in some contexts where the technology could be beneficial, and make it more likely when it's dangerous. Public transparency alone cannot close this gap: many safety- and security-relevant details are legitimately confidential and require expert interpretation. We define frontier AI auditing as rigorous third-party verification of frontier AI developers' safety and security claims, and evaluation of their systems and practices against relevant standards, based on deep, secure access to non-public information. To make rigor legible and comparable, we introduce AI Assurance Levels (AAL-1 to AAL-4), ranging from time-bounded system audits to continuous, deception-resilient verification.

翻译：前沿人工智能正成为关键的社会基础设施，但外界缺乏可靠方法来评判领先开发者的安全声明是否准确，以及其实践是否符合相关标准。相较于我们日常依赖的其他社会与技术系统（如消费品、企业财务报表和食品供应链），人工智能在多个维度上受到的第三方审查严格程度较低。人工智能系统是否值得信赖的模糊性，可能阻碍该技术在部分有益场景中的部署，同时增加其在危险情境中被使用的可能性。仅靠公开透明度无法弥合这一差距：许多与安全相关的细节具有合理的保密性，且需要专家解读。我们将前沿人工智能审计定义为：基于对非公开信息的深度安全访问，由第三方对前沿人工智能开发者的安全声明进行严格验证，并依据相关标准评估其系统与实践。为使严谨性具备可辨识性与可比性，我们提出了人工智能保障等级体系（AAL-1至AAL-4），涵盖从有时限的系统审计到持续且具备抗欺骗能力的验证。