通用人工智能评估标准工作组：迈向有效的人工智能治理 (GPAI Evaluations Standards Taskforce: Towards Effective AI Governance)

General-purpose AI evaluations have been proposed as a promising way of identifying and mitigating systemic risks posed by AI development and deployment. While GPAI evaluations play an increasingly central role in institutional decision- and policy-making -- including by way of the European Union AI Act's mandate to conduct evaluations on GPAI models presenting systemic risk -- no standards exist to date to promote their quality or legitimacy. To strengthen GPAI evaluations in the EU, which currently constitutes the first and only jurisdiction that mandates GPAI evaluations, we outline four desiderata for GPAI evaluations: internal validity, external validity, reproducibility, and portability. To uphold these desiderata in a dynamic environment of continuously evolving risks, we propose a dedicated EU GPAI Evaluation Standards Taskforce, to be housed within the bodies established by the EU AI Act. We outline the responsibilities of the Taskforce, specify the GPAI provider commitments that would facilitate Taskforce success, discuss the potential impact of the Taskforce on global AI governance, and address potential sources of failure that policymakers should heed.

翻译：通用人工智能评估已被提出作为一种有前景的方法，用以识别和缓解人工智能开发与部署所带来的系统性风险。尽管通用人工智能评估在机构决策和政策制定中扮演着日益核心的角色——包括通过欧盟《人工智能法案》对呈现系统性风险的通用人工智能模型强制进行评估——但迄今为止，尚无任何标准来提升其质量或合法性。为了加强欧盟的通用人工智能评估（欧盟是目前第一个也是唯一一个强制要求进行通用人工智能评估的司法管辖区），我们概述了通用人工智能评估的四个理想特性：内部效度、外部效度、可复现性和可移植性。为了在不断演化的风险动态环境中维护这些理想特性，我们提议在欧盟《人工智能法案》设立的机构内成立一个专门的欧盟通用人工智能评估标准工作组。我们概述了该工作组的职责，明确了有助于工作组取得成功的通用人工智能提供方承诺，讨论了工作组对全球人工智能治理的潜在影响，并指出了政策制定者应注意的潜在失败根源。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日