Existing works examining Vision Language Models (VLMs) for social biases predominantly focus on a limited set of documented bias associations, such as gender:profession or race:crime. This narrow scope often overlooks a vast range of unexamined implicit associations, restricting the identification and, hence, mitigation of such biases. We address this gap by probing VLMs to (1) uncover hidden, implicit associations across 9 bias dimensions. We systematically explore diverse input and output modalities and (2) demonstrate how biased associations vary in their negativity, toxicity, and extremity. Our work (3) identifies subtle and extreme biases that are typically not recognized by existing methodologies. We make the Dataset of retrieved associations, (Dora), publicly available here https://github.com/chahatraj/BiasDora.
翻译:现有研究在考察视觉语言模型(VLMs)的社会偏见时,主要局限于一组有限的已知偏见关联,例如性别与职业或种族与犯罪。这种狭窄的视野往往忽略了大量未经检验的隐性关联,从而限制了对这些偏见的识别与缓解。为弥补这一不足,我们通过探测VLMs来(1)揭示9个偏见维度中隐藏的隐性关联。我们系统地探索了多样的输入和输出模态,并(2)论证了偏见关联在负面性、毒害性和极端性上的差异。我们的工作(3)识别出通常被现有方法忽略的微妙及极端偏见。我们将检索到的关联数据集(Dora)公开于此:https://github.com/chahatraj/BiasDora。