Evaluating the Social Impact of Generative AI Systems in Systems and Society

Irene Solaiman,Zeerak Talat,William Agnew,Lama Ahmad,Dylan Baker,Su Lin Blodgett,Canyu Chen,Hal Daumé III,Jesse Dodge,Isabella Duan,Ellie Evans,Felix Friedrich,Avijit Ghosh,Usman Gohar,Sara Hooker,Yacine Jernite,Ria Kalluri,Alberto Lusoli,Alina Leidinger,Michelle Lin,Xiuzhu Lin,Sasha Luccioni,Jennifer Mickel,Margaret Mitchell,Jessica Newman,Anaelia Ovalle,Marie-Therese Png,Shubham Singh,Andrew Strait,Lukas Struppek,Arjun Subramonian

from arxiv, Forthcoming in Hacker, Engel, Hammer, Mittelstadt (eds), Oxford Handbook on the Foundations and Regulation of Generative AI. Oxford University Press

Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categories: what can be evaluated in a base system independent of context and what can be evaluated in a societal context. Importantly, this refers to base systems that have no predetermined application or deployment context, including a model itself, as well as system components, such as training data. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to listed generative modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what can be evaluated in a broader societal context, each with its own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm.

翻译：跨模态（包括文本（含代码）、图像、音频和视频）的生成式人工智能系统具有广泛的社会影响，但目前尚无评估这些影响的方法或应评估哪些影响的官方标准。本文提出一份指南，旨在推动建立一种标准化的评估方法，用于评估任何模态的基础生成式人工智能系统，涵盖两大总体范畴：可在与语境无关的基础系统中评估的方面，以及可在社会语境中评估的方面。需特别指出的是，这里所指的基础系统不包含预设应用或部署语境，包括模型本身以及训练数据等系统组件。我们为基础系统建立的框架定义了七类社会影响：偏见、刻板印象与表征性危害；文化价值与敏感内容；差异化性能表现；隐私与数据保护；财务成本；环境成本；以及数据与内容审核的劳动力成本。建议的评估方法适用于所列的生成模态，而对现有评估局限性的分析可作为未来必要评估投入的起点。针对更广泛社会语境下的可评估内容，我们提出五大总体类别，每个类别包含相应子类：可信度与自主性；不平等、边缘化与暴力；权力集中；劳动力与创造力；以及生态系统与环境。每个子类别均包含减轻危害的建议。