Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. In this paper, we make two main contributions toward establishing such evaluations. First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks. This framework encompasses capability evaluations, which are the main current approach to safety evaluation. It then reaches further by building on system safety principles, particularly the insight that context determines whether a given capability may cause harm. To account for relevant context, our framework adds human interaction and systemic impacts as additional layers of evaluation. Second, we survey the current state of safety evaluation of generative AI systems and create a repository of existing evaluations. Three salient evaluation gaps emerge from this analysis. We propose ways forward to closing these gaps, outlining practical steps as well as roles and responsibilities for different actors. Sociotechnical safety evaluation is a tractable approach to the robust and comprehensive safety evaluation of generative AI systems.
翻译:生成式AI系统会带来一系列风险。为保障生成式AI系统的安全性,必须对这些风险进行评估。本文在建立此类评估方面做出两项主要贡献。首先,我们提出一个三层级框架,采用结构化的社会技术方法来评估这些风险。该框架涵盖能力评估(当前安全评估的主要方法),并根据系统安全原则进一步拓展,特别是认识到具体情境决定特定能力是否可能造成危害。为纳入相关情境,我们的框架增加了人机交互与系统性影响作为额外的评估层级。其次,我们调研了生成式AI系统安全评估的现状,并创建了现有评估的存储库。通过分析,我们发现了三个显著的安全评估缺口。我们提出弥合这些缺口的可行方案,概述了具体实践步骤以及不同参与者的角色与责任。社会技术安全评估是实现生成式AI系统稳健而全面安全评估的可行路径。