人工智能安全框架分级评估标准 (A Grading Rubric for AI Safety Frameworks)

Over the past year, artificial intelligence (AI) companies have been increasingly adopting AI safety frameworks. These frameworks outline how companies intend to keep the potential risks associated with developing and deploying frontier AI systems to an acceptable level. Major players like Anthropic, OpenAI, and Google DeepMind have already published their frameworks, while another 13 companies have signaled their intent to release similar frameworks by February 2025. Given their central role in AI companies' efforts to identify and address unacceptable risks from their systems, AI safety frameworks warrant significant scrutiny. To enable governments, academia, and civil society to pass judgment on these frameworks, this paper proposes a grading rubric. The rubric consists of seven evaluation criteria and 21 indicators that concretize the criteria. Each criterion can be graded on a scale from A (gold standard) to F (substandard). The paper also suggests three methods for applying the rubric: surveys, Delphi studies, and audits. The purpose of the grading rubric is to enable nuanced comparisons between frameworks, identify potential areas of improvement, and promote a race to the top in responsible AI development.

翻译：过去一年间，人工智能（AI）公司日益广泛地采用AI安全框架。这些框架阐述了企业如何将开发与部署前沿AI系统相关的潜在风险控制在可接受水平。Anthropic、OpenAI和Google DeepMind等主要参与者已发布其框架，另有13家公司表示将在2025年2月前发布类似框架。鉴于AI安全框架在帮助企业识别和应对系统不可接受风险方面的核心作用，对其进行严格审查至关重要。为助力政府部门、学术界和公民社会对这些框架进行评估，本文提出一套分级评估标准。该标准包含七项评估准则和21项具体化指标，每项准则可按A级（黄金标准）至F级（不合格）进行分级。本文还提出三种应用该标准的方法：问卷调查、德尔菲研究和审计核查。本分级评估标准旨在实现框架间的精细化比较，识别潜在改进领域，并推动负责任AI开发领域的良性竞争。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日