Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery. At the same time, these models have been shown to suffer from harmful biases, including exaggerated societal biases (e.g., gender, ethnicity), as well as incidental correlations that limit such a model's ability to generate more diverse imagery. In this paper, we propose a general approach to study and quantify a broad spectrum of biases, for any TTI model and for any prompt, using counterfactual reasoning. Unlike other works that evaluate generated images on a predefined set of bias axes, our approach automatically identifies potential biases that might be relevant to the given prompt, and measures those biases. In addition, we complement quantitative scores with post-hoc explanations in terms of semantic concepts in the images generated. We show that our method is uniquely capable of explaining complex multi-dimensional biases through semantic concepts, as well as the intersectionality between different biases for any given prompt. We perform extensive user studies to illustrate that the results of our method and analysis are consistent with human judgements.
翻译:文本到图像(TTI)生成模型在过去几年中取得了显著进展,能够生成复杂且高质量的图像。与此同时,这些模型已被证明存在有害的偏见,包括放大的社会偏见(例如性别、种族)以及限制模型生成更多样化图像的偶然相关性。本文提出一种通用方法,利用反事实推理来研究和量化任何TTI模型及任何提示下的广泛偏见谱系。与在预定义偏见轴上评估生成图像的其他工作不同,我们的方法能自动识别与给定提示可能相关的潜在偏见,并对其进行量化测量。此外,我们通过生成图像中的语义概念提供后验解释,以补充量化评分。研究表明,我们的方法能通过语义概念独特地解释复杂的多维偏见,以及任意提示下不同偏见之间的交叉性。我们进行了广泛的用户研究,证明本方法的结果与分析结论与人类判断具有一致性。