Sociotechnical Safety Evaluation of Generative AI Systems

Laura Weidinger,Maribeth Rauh,Nahema Marchal,Arianna Manzini,Lisa Anne Hendricks,Juan Mateos-Garcia,Stevie Bergman,Jackie Kay,Conor Griffin,Ben Bariach,Iason Gabriel,Verena Rieser,William Isaac

from arxiv, main paper p.1-29, 5 figures, 2 tables

Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. In this paper, we make two main contributions toward establishing such evaluations. First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks. This framework encompasses capability evaluations, which are the main current approach to safety evaluation. It then reaches further by building on system safety principles, particularly the insight that context determines whether a given capability may cause harm. To account for relevant context, our framework adds human interaction and systemic impacts as additional layers of evaluation. Second, we survey the current state of safety evaluation of generative AI systems and create a repository of existing evaluations. Three salient evaluation gaps emerge from this analysis. We propose ways forward to closing these gaps, outlining practical steps as well as roles and responsibilities for different actors. Sociotechnical safety evaluation is a tractable approach to the robust and comprehensive safety evaluation of generative AI systems.

翻译：生成式AI系统会带来一系列风险。为保障生成式AI系统的安全性，必须对这些风险进行评估。本文在建立此类评估方面做出两项主要贡献。首先，我们提出一个三层级框架，采用结构化的社会技术方法来评估这些风险。该框架涵盖能力评估（当前安全评估的主要方法），并根据系统安全原则进一步拓展，特别是认识到具体情境决定特定能力是否可能造成危害。为纳入相关情境，我们的框架增加了人机交互与系统性影响作为额外的评估层级。其次，我们调研了生成式AI系统安全评估的现状，并创建了现有评估的存储库。通过分析，我们发现了三个显著的安全评估缺口。我们提出弥合这些缺口的可行方案，概述了具体实践步骤以及不同参与者的角色与责任。社会技术安全评估是实现生成式AI系统稳健而全面安全评估的可行路径。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日