While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes from existing datasets. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intserctional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). We conduct extensive experiments using our generated dataset which reveal the intersectional social biases present in state-of-the-art VLMs.
翻译:尽管视觉-语言模型(VLMs)近期取得了显著的性能提升,但越来越多的证据表明,这些模型在性别和种族等社会属性方面存在有害偏见。以往研究主要分别探测单一社会属性的偏见,而忽视了不同属性交叉产生的偏见。这可能是由于从现有数据集中难以收集涵盖各种社会属性组合的完整图像-文本对。为解决这一挑战,我们利用文本到图像扩散模型生成反事实示例,以大规模探测交叉社会偏见。本方法采用带交叉注意力控制的稳定扩散模型(Stable Diffusion),生成一组高度相似但在交叉社会属性(例如种族与性别)上存在差异的反事实图像-文本对(如特定职业的描绘)。通过利用生成数据集进行的广泛实验,揭示了当前最先进的VLMs中存在的交叉社会偏见。