As artificial intelligence (AI) models are scaled up, new capabilities can emerge unintentionally and unpredictably, some of which might be dangerous. In response, dangerous capabilities evaluations have emerged as a new risk assessment tool. But what should frontier AI developers do if sufficiently dangerous capabilities are in fact discovered? This paper focuses on one possible response: coordinated pausing. It proposes an evaluation-based coordination scheme that consists of five main steps: (1) Frontier AI models are evaluated for dangerous capabilities. (2) Whenever, and each time, a model fails a set of evaluations, the developer pauses certain research and development activities. (3) Other developers are notified whenever a model with dangerous capabilities has been discovered. They also pause related research and development activities. (4) The discovered capabilities are analyzed and adequate safety precautions are put in place. (5) Developers only resume their paused activities if certain safety thresholds are reached. The paper also discusses four concrete versions of that scheme. In the first version, pausing is completely voluntary and relies on public pressure on developers. In the second version, participating developers collectively agree to pause under certain conditions. In the third version, a single auditor evaluates models of multiple developers who agree to pause if any model fails a set of evaluations. In the fourth version, developers are legally required to run evaluations and pause if dangerous capabilities are discovered. Finally, the paper discusses the desirability and feasibility of our proposed coordination scheme. It concludes that coordinated pausing is a promising mechanism for tackling emerging risks from frontier AI models. However, a number of practical and legal obstacles need to be overcome, especially how to avoid violations of antitrust law.
翻译:随着人工智能(AI)模型的规模扩大,新的能力可能会无意间且不可预测地涌现,其中一些能力可能具有危险性。为此,危险能力评估已作为一种新型风险评估工具出现。但如果确实发现了足够危险的能力,前沿AI开发者应如何应对?本文聚焦于一种可能的应对措施:协调暂停。它提出了一种基于评估的协调方案,包含五个主要步骤:(1)对前沿AI模型的危险能力进行评估。(2)每当模型未通过一组评估时,开发者暂停相关研发活动。(3)一旦发现具有危险能力的模型,通知其他开发者,他们也会暂停相关研发活动。(4)分析已发现的能力,并采取充分的安全预防措施。(5)仅在达到特定安全阈值时,开发者才能恢复暂停的活动。本文还讨论了该方案的四种具体版本。在第一个版本中,暂停完全自愿,依赖公众对开发者的压力。在第二个版本中,参与开发者集体同意在特定条件下暂停。在第三个版本中,单一审计员评估多个开发者的模型,若任何模型未通过一组评估,这些开发者同意暂停。在第四个版本中,法律要求开发者运行评估,并在发现危险能力时暂停。最后,本文讨论了所提协调方案的理想性与可行性,结论认为协调暂停是应对前沿AI模型新兴风险的一种有前景机制,但仍需克服若干实践与法律障碍,特别是如何避免违反反垄断法。