量化危险能力检测率：危险能力评估的理论模型 (Quantifying detection rates for dangerous capabilities: a theoretical model of dangerous capability evaluations)

We present a quantitative model for tracking dangerous AI capabilities over time. Our goal is to help the policy and research community visualise how dangerous capability testing can give us an early warning about approaching AI risks. We first use the model to provide a novel introduction to dangerous capability testing and how this testing can directly inform policy. Decision makers in AI labs and government often set policy that is sensitive to the estimated danger of AI systems, and may wish to set policies that condition on the crossing of a set threshold for danger. The model helps us to reason about these policy choices. We then run simulations to illustrate how we might fail to test for dangerous capabilities. To summarise, failures in dangerous capability testing may manifest in two ways: higher bias in our estimates of AI danger, or larger lags in threshold monitoring. We highlight two drivers of these failure modes: uncertainty around dynamics in AI capabilities and competition between frontier AI labs. Effective AI policy demands that we address these failure modes and their drivers. Even if the optimal targeting of resources is challenging, we show how delays in testing can harm AI policy. We offer preliminary recommendations for building an effective testing ecosystem for dangerous capabilities and advise on a research agenda.

翻译：我们提出了一个用于追踪人工智能危险能力随时间变化的定量模型。我们的目标是帮助政策制定和研究界直观理解危险能力测试如何能为我们提供关于逼近的人工智能风险的早期预警。我们首先利用该模型对危险能力测试及其如何直接影响政策制定进行了新颖的阐述。人工智能实验室和政府部门的决策者制定的政策通常对人工智能系统的预估危险性较为敏感，并可能希望设定以危险度超越特定阈值为条件的政策。该模型有助于我们分析这些政策选择。随后，我们通过模拟演示了在危险能力测试中可能出现的失误。总结而言，危险能力测试的失败可能以两种形式显现：对人工智能危险性的评估存在更高偏差，或阈值监测出现更大滞后。我们重点指出了导致这些失效模式的两个驱动因素：人工智能能力动态变化的不确定性以及前沿人工智能实验室间的竞争。有效的人工智能政策要求我们应对这些失效模式及其驱动因素。即使实现资源的最优配置具有挑战性，我们仍展示了测试延迟如何损害人工智能政策。我们为构建有效的危险能力测试生态系统提出了初步建议，并就相关研究议程提供了指导。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日