Vision-language models are growing in popularity and public visibility to generate, edit, and caption images at scale; but their outputs can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet. Although debiasing methods have been proposed, we argue that these measurements of model bias lack validity due to dataset bias. We demonstrate there are spurious correlations in COCO Captions, the most commonly used dataset for evaluating bias, between background context and the gender of people in-situ. This is problematic because commonly-used bias metrics (such as Bias@K) rely on per-gender base rates. To address this issue, we propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets, where only the gender of the subject is edited and the background is fixed. However, existing image editing methods have limitations and sometimes produce low-quality images; so, we introduce a method to automatically filter the generated images based on their similarity to real images. Using our balanced synthetic contrast sets, we benchmark bias in multiple CLIP-based models, demonstrating how metrics are skewed by imbalance in the original COCO images. Our results indicate that the proposed approach improves the validity of the evaluation, ultimately contributing to more realistic understanding of bias in vision-language models.
翻译:视觉语言模型在规模化生成、编辑和描述图像方面日益普及且广受关注,但其输出可能延续并放大从互联网非精选图像-文本对预训练过程中习得的社会偏见。尽管已有去偏方法被提出,但我们认为这些模型偏见的衡量因数据集偏见而缺乏有效性。我们证明,在评估偏见最常用的COCO Captions数据集中,背景语境与人物性别之间存在虚假关联。这导致常用偏见度量指标(如Bias@K)依赖于分性别的基准率。为解决该问题,我们提出一种新颖的数据集去偏流水线,通过合成性别平衡的对比集来增强COCO数据集——仅编辑主体性别而固定背景。然而现有图像编辑方法存在局限,有时会生成低质量图像;为此我们引入基于真实图像相似度的自动筛选方法。利用平衡后的合成对比集,我们对多个CLIP模型开展偏见基准测试,证明原始COCO图像中的不平衡如何扭曲度量指标。实验结果表明,所提方法能提升评估的有效性,最终有助于更真实地理解视觉语言模型中的偏见。