Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation

The recent advancement of large and powerful models with Text-to-Image (T2I) generation abilities -- such as OpenAI's DALLE-3 and Google's Gemini -- enables users to generate high-quality images from textual prompts. However, it has become increasingly evident that even simple prompts could cause T2I models to exhibit conspicuous social bias in generated images. Such bias might lead to both allocational and representational harms in society, further marginalizing minority groups. Noting this problem, a large body of recent works has been dedicated to investigating different dimensions of bias in T2I systems. However, an extensive review of these studies is lacking, hindering a systematic understanding of current progress and research gaps. We present the first extensive survey on bias in T2I generative models. In this survey, we review prior studies on dimensions of bias: Gender, Skintone, and Geo-Culture. Specifically, we discuss how these works define, evaluate, and mitigate different aspects of bias. We found that: (1) while gender and skintone biases are widely studied, geo-cultural bias remains under-explored; (2) most works on gender and skintone bias investigated occupational association, while other aspects are less frequently studied; (3) almost all gender bias works overlook non-binary identities in their studies; (4) evaluation datasets and metrics are scattered, with no unified framework for measuring biases; and (5) current mitigation methods fail to resolve biases comprehensively. Based on current limitations, we point out future research directions that contribute to human-centric definitions, evaluations, and mitigation of biases. We hope to highlight the importance of studying biases in T2I systems, as well as encourage future efforts to holistically understand and tackle biases, building fair and trustworthy T2I technologies for everyone.

翻译：近年来，具有文本到图像生成能力的大型强大模型——如OpenAI的DALL-E-3和Google的Gemini——取得了突破性进展，使用户能够通过文本提示生成高质量图像。然而，越来越明显的是，即使简单的提示也可能导致T2I模型在生成图像中表现出显著的社会偏见。这种偏见可能在社会中引发分配性和表征性危害，进一步边缘化少数群体。注意到这一问题，大量近期研究致力于探索T2I系统中偏见的不同维度。然而，目前缺乏对这些研究的全面回顾，阻碍了对当前进展与研究空白的系统性理解。本文首次对T2I生成模型中的偏见进行了广泛综述。在本综述中，我们回顾了关于偏见维度的先前研究：性别、肤色与地理文化。具体而言，我们讨论了这些研究如何定义、评估和缓解偏见的不同方面。我们发现：（1）尽管性别和肤色偏见被广泛研究，但地理文化偏见仍探索不足；（2）大多数关于性别和肤色偏见的研究集中于职业关联，而其他方面较少被探究；（3）几乎所有性别偏见研究都忽略了非二元身份；（4）评估数据集和指标分散，缺乏统一的偏见测量框架；（5）当前的缓解方法未能全面解决偏见问题。基于现有局限性，我们指出了未来研究方向，致力于以人为中心的定义、评估和缓解偏见。我们希望强调研究T2I系统中偏见的重要性，并鼓励未来的工作全面理解和应对偏见，为所有人构建公平可信的T2I技术。