Generative AI systems across modalities, ranging from text (including code), image, audio, and video, have broad social impacts, but there is no official standard for means of evaluating those impacts or for which impacts should be evaluated. In this paper, we present a guide that moves toward a standard approach in evaluating a base generative AI system for any modality in two overarching categories: what can be evaluated in a base system independent of context and what can be evaluated in a societal context. Importantly, this refers to base systems that have no predetermined application or deployment context, including a model itself, as well as system components, such as training data. Our framework for a base system defines seven categories of social impact: bias, stereotypes, and representational harms; cultural values and sensitive content; disparate performance; privacy and data protection; financial costs; environmental costs; and data and content moderation labor costs. Suggested methods for evaluation apply to listed generative modalities and analyses of the limitations of existing evaluations serve as a starting point for necessary investment in future evaluations. We offer five overarching categories for what can be evaluated in a broader societal context, each with its own subcategories: trustworthiness and autonomy; inequality, marginalization, and violence; concentration of authority; labor and creativity; and ecosystem and environment. Each subcategory includes recommendations for mitigating harm.
翻译:跨模态(包括文本(含代码)、图像、音频和视频)的生成式人工智能系统具有广泛的社会影响,但目前尚无评估这些影响的标准方法,也未明确应评估哪些影响。本文提出一份指南,旨在推动建立一种标准化的评估方法,用于评估任何模态的基础生成式人工智能系统,涵盖两大总体范畴:可在脱离具体情境下评估的基础系统特性,以及需置于社会情境中评估的影响。重要的是,此处所指的基础系统不包含预设应用场景或部署环境,包括模型本身及系统组件(如训练数据)。我们为基础系统构建的评估框架定义了七类社会影响:偏见、刻板印象与表征性危害;文化价值与敏感内容;差异化性能表现;隐私与数据保护;财务成本;环境成本;数据与内容审核人力成本。建议的评估方法适用于所列的生成模态,而对现有评估局限性的分析则为未来评估所需的投入提供了起点。针对更广泛的社会情境评估,我们提出五大总体范畴(每个范畴包含相应子类):可信度与自主性;不平等、边缘化与暴力;权力集中;劳动与创造力;生态系统与环境。每个子类别均包含缓解危害的建议。