Recent progress in Text-to-Image (T2I) generative models has enabled high-quality image generation. As performance and accessibility increase, these models are gaining significant attraction and popularity: ensuring their fairness and safety is a priority to prevent the dissemination and perpetuation of biases. However, existing studies in bias detection focus on closed sets of predefined biases (e.g., gender, ethnicity). In this paper, we propose a general framework to identify, quantify, and explain biases in an open set setting, i.e. without requiring a predefined set. This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions. Next, these captions are used by the target generative model for generating a set of images. Finally, Vision Question Answering (VQA) is leveraged for bias evaluation. We show two variations of this framework: OpenBias and GradBias. OpenBias detects and quantifies biases, while GradBias determines the contribution of individual prompt words on biases. OpenBias effectively detects both well-known and novel biases related to people, objects, and animals and highly aligns with existing closed-set bias detection methods and human judgment. GradBias shows that neutral words can significantly influence biases and it outperforms several baselines, including state-of-the-art foundation models. Code available here: https://github.com/Moreno98/GradBias.
翻译:近年来,文本到图像(T2I)生成模型取得了显著进展,实现了高质量的图像生成。随着性能和可访问性的提升,这些模型获得了极大的关注和普及:确保其公平性和安全性成为防止偏见传播和延续的优先事项。然而,现有偏见检测研究主要集中于封闭的、预定义的偏见集合(例如性别、种族)。本文提出一个通用框架,用于在开放集合设置下识别、量化和解释偏见,即无需预定义集合。该流程利用大型语言模型(LLM)从一组描述文本出发提出潜在的偏见假设。随后,目标生成模型使用这些描述生成一组图像。最后,通过视觉问答(VQA)技术进行偏见评估。我们展示了该框架的两种变体:OpenBias 与 GradBias。OpenBias 用于检测和量化偏见,而 GradBias 则用于确定提示词中各个词语对偏见的具体贡献。OpenBias 能有效检测涉及人物、物体和动物的已知及新型偏见,并与现有封闭集合偏见检测方法及人类判断高度一致。GradBias 表明中性词语可能对偏见产生显著影响,其性能优于包括最先进基础模型在内的多种基线方法。代码发布于:https://github.com/Moreno98/GradBias。