Text-to-image generative models are becoming increasingly popular and accessible to the general public. As these models see large-scale deployments, it is necessary to deeply investigate their safety and fairness to not disseminate and perpetuate any kind of biases. However, existing works focus on detecting closed sets of biases defined a priori, limiting the studies to well-known concepts. In this paper, we tackle the challenge of open-set bias detection in text-to-image generative models presenting OpenBias, a new pipeline that identifies and quantifies the severity of biases agnostically, without access to any precompiled set. OpenBias has three stages. In the first phase, we leverage a Large Language Model (LLM) to propose biases given a set of captions. Secondly, the target generative model produces images using the same set of captions. Lastly, a Vision Question Answering model recognizes the presence and extent of the previously proposed biases. We study the behavior of Stable Diffusion 1.5, 2, and XL emphasizing new biases, never investigated before. Via quantitative experiments, we demonstrate that OpenBias agrees with current closed-set bias detection methods and human judgement.
翻译:文生图生成模型正日益普及并向公众开放。随着这些模型的大规模部署,有必要深入探究其安全性与公平性,以避免传播和固化各类偏见。然而,现有研究主要集中于检测预先定义的封闭偏见集合,将研究范围局限于已知概念。本文针对文生图生成模型中的开放集偏见检测难题,提出OpenBias——一种无需依赖任何预定义集合、即可自主识别并量化偏见严重程度的新流程。OpenBias包含三个阶段:首先,我们利用大语言模型(LLM)根据给定描述文本生成偏见假设;其次,目标生成模型使用同一组描述文本生成图像;最后,通过视觉问答模型识别先前假设偏见的出现频率与程度。我们以Stable Diffusion 1.5、2.0及XL版本为研究对象,重点揭示了若干从未被探讨过的新型偏见。定量实验表明,OpenBias的检测结果与当前封闭集偏见检测方法及人类判断具有一致性。