Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

David "davidad" Dalrymple,Joar Skalse,Yoshua Bengio,Stuart Russell,Max Tegmark,Sanjit Seshia,Steve Omohundro,Christian Szegedy,Ben Goldhaber,Nora Ammann,Alessandro Abate,Joe Halpern,Clark Barrett,Ding Zhao,Tan Zhi-Xuan,Jeannette Wing,Joshua Tenenbaum

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.

翻译：确保人工智能系统可靠且稳健地避免有害或危险行为是一个至关重要的挑战，特别是对于具有高度自主性和通用智能的人工智能系统，或在安全关键环境中使用的系统。在本文中，我们将介绍并定义一类人工智能安全方法，我们将其称为有保证的安全人工智能。这些方法的核心特征是旨在构建配备高可信度定量安全保证的人工智能系统。这是通过三个核心组件的相互作用实现的：一个世界模型（提供人工智能系统如何影响外部世界的数学描述）、一个安全规约（描述哪些影响是可接受的数学定义）以及一个验证器（提供一个可审计的证明证书，证明该人工智能相对于世界模型满足安全规约）。我们概述了创建这三个核心组件的多种方法，描述了主要的技术挑战，并提出了若干潜在的解决方案。我们还论证了这种人工智能安全方法的必要性，以及主要替代方法的不足。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日