Data Matters Most: Auditing Social Bias in Contrastive Vision Language Models

Vision-language models (VLMs) deliver strong zero-shot recognition but frequently inherit social biases from their training data. We systematically disentangle three design factors -- model size, training-data scale, and training-data source -- by comparing CLIP and OpenCLIP, two models that share an identical contrastive objective yet differ in encoder width and in the image-text corpora on which they are pre-trained (400M proprietary pairs vs. 400M/2B LAION). Across balanced face-analysis benchmarks, enlarging the encoder reduces gender skew in CLIP but amplifies both gender and racial skew in OpenCLIP; increasing the LAION corpus from 400M to 2B further increases OpenCLIP bias. At matched model and data budgets, substituting proprietary data with LAION improves gender fairness while increasing racial skew, underscoring data source as the primary driver of bias patterns. We also evaluate three post-hoc, test-time debiasing strategies -- Bias Prompts, Prompt Array, and SANER. Debiasing reduces but does not eliminate harm, and its effectiveness is source- and size-dependent: Bias Prompts most effectively reduce gender skew in CLIP at smaller model sizes, whereas Prompt Array and SANER more reliably reduce racial skew in OpenCLIP; scaling LAION reconfigures which method is most fair. Taken together, these findings challenge the assumption that bigger models or datasets are automatically fairer and foreground training data source as the key determinant of both bias and mitigation efficacy. We release code and evaluation scripts to enable transparent, reproducible auditing of future VLMs.

翻译：视觉语言模型（VLMs）具备强大的零样本识别能力，但常从训练数据中继承社会偏见。我们通过比较CLIP和OpenCLIP两种模型，系统性地解构了三个设计因素——模型规模、训练数据规模和训练数据来源。这两种模型共享相同的对比学习目标，但在编码器宽度和预训练所用的图文语料上存在差异（4亿对专有数据 vs. 4亿/20亿LAION数据）。在均衡的人脸分析基准测试中，增大编码器规模会降低CLIP的性别偏见，却会加剧OpenCLIP的性别与种族偏见；将LAION语料从4亿扩增至20亿会进一步增加OpenCLIP的偏见。在模型和数据预算相同的情况下，用LAION数据替代专有数据可改善性别公平性，但同时加剧种族偏见，这凸显了数据来源是偏见模式的主要驱动因素。我们还评估了三种事后测试时去偏策略——Bias Prompts、Prompt Array和SANER。去偏处理能减少但无法完全消除危害，其效果取决于数据来源和模型规模：Bias Prompts在较小规模CLIP模型上对降低性别偏见最有效，而Prompt Array和SANER在降低OpenCLIP种族偏见方面更可靠；扩展LAION数据会重构不同方法的公平性效果。综合来看，这些发现挑战了“更大模型或数据集自动更公平”的假设，并凸显训练数据来源是决定偏见模式及缓解效果的关键因素。我们公开了代码与评估脚本，以支持对未来VLMs进行透明、可复现的审计。