Text-to-Image (TTI) generative models have shown great progress in the past few years in terms of their ability to generate complex and high-quality imagery. At the same time, these models have been shown to suffer from harmful biases, including exaggerated societal biases (e.g., gender, ethnicity), as well as incidental correlations that limit such model's ability to generate more diverse imagery. In this paper, we propose a general approach to study and quantify a broad spectrum of biases, for any TTI model and for any prompt, using counterfactual reasoning. Unlike other works that evaluate generated images on a predefined set of bias axes, our approach automatically identifies potential biases that might be relevant to the given prompt, and measures those biases. In addition, our paper extends quantitative scores with post-hoc explanations in terms of semantic concepts in the images generated. We show that our method is uniquely capable of explaining complex multi-dimensional biases through semantic concepts, as well as the intersectionality between different biases for any given prompt. We perform extensive user studies to illustrate that the results of our method and analysis are consistent with human judgements.
翻译:文本到图像(TTI)生成模型在过去几年中取得了显著进展,能够生成复杂且高质量的图像。然而,这些模型也被发现存在有害的偏见,包括夸大社会偏见(如性别、种族),以及限制模型生成更多样化图像的偶然相关性。本文提出一种通用方法,利用反事实推理来研究和量化任意TTI模型与提示词中的广泛偏见。与已有工作在预设偏见轴上评估生成图像不同,我们的方法能自动识别与给定提示词相关的潜在偏见并进行量化。此外,本文通过生成图像中语义概念的后验解释扩展了定量评分。研究表明,我们的方法能够独特地通过语义概念解释复杂的多维偏见,以及任意提示词下不同偏见之间的交叉性。我们通过大规模用户研究验证,该方法与人类判断结果高度一致。