Coordinated pausing: An evaluation-based coordination scheme for frontier AI developers

As artificial intelligence (AI) models are scaled up, new capabilities can emerge unintentionally and unpredictably, some of which might be dangerous. In response, dangerous capabilities evaluations have emerged as a new risk assessment tool. But what should frontier AI developers do if sufficiently dangerous capabilities are in fact discovered? This paper focuses on one possible response: coordinated pausing. It proposes an evaluation-based coordination scheme that consists of five main steps: (1) Frontier AI models are evaluated for dangerous capabilities. (2) Whenever, and each time, a model fails a set of evaluations, the developer pauses certain research and development activities. (3) Other developers are notified whenever a model with dangerous capabilities has been discovered. They also pause related research and development activities. (4) The discovered capabilities are analyzed and adequate safety precautions are put in place. (5) Developers only resume their paused activities if certain safety thresholds are reached. The paper also discusses four concrete versions of that scheme. In the first version, pausing is completely voluntary and relies on public pressure on developers. In the second version, participating developers collectively agree to pause under certain conditions. In the third version, a single auditor evaluates models of multiple developers who agree to pause if any model fails a set of evaluations. In the fourth version, developers are legally required to run evaluations and pause if dangerous capabilities are discovered. Finally, the paper discusses the desirability and feasibility of our proposed coordination scheme. It concludes that coordinated pausing is a promising mechanism for tackling emerging risks from frontier AI models. However, a number of practical and legal obstacles need to be overcome, especially how to avoid violations of antitrust law.

翻译：随着人工智能（AI）模型的规模扩大，新的能力可能会无意间且不可预测地涌现，其中一些能力可能具有危险性。为此，危险能力评估已作为一种新型风险评估工具出现。但如果确实发现了足够危险的能力，前沿AI开发者应如何应对？本文聚焦于一种可能的应对措施：协调暂停。它提出了一种基于评估的协调方案，包含五个主要步骤：（1）对前沿AI模型的危险能力进行评估。（2）每当模型未通过一组评估时，开发者暂停相关研发活动。（3）一旦发现具有危险能力的模型，通知其他开发者，他们也会暂停相关研发活动。（4）分析已发现的能力，并采取充分的安全预防措施。（5）仅在达到特定安全阈值时，开发者才能恢复暂停的活动。本文还讨论了该方案的四种具体版本。在第一个版本中，暂停完全自愿，依赖公众对开发者的压力。在第二个版本中，参与开发者集体同意在特定条件下暂停。在第三个版本中，单一审计员评估多个开发者的模型，若任何模型未通过一组评估，这些开发者同意暂停。在第四个版本中，法律要求开发者运行评估，并在发现危险能力时暂停。最后，本文讨论了所提协调方案的理想性与可行性，结论认为协调暂停是应对前沿AI模型新兴风险的一种有前景机制，但仍需克服若干实践与法律障碍，特别是如何避免违反反垄断法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日