Vision-language models (VLMs) pre-trained on extensive datasets can inadvertently learn biases by correlating gender information with specific objects or scenarios. Current methods, which focus on modifying inputs and monitoring changes in the model's output probability scores, often struggle to comprehensively understand bias from the perspective of model components. We propose a framework that incorporates causal mediation analysis to measure and map the pathways of bias generation and propagation within VLMs. This approach allows us to identify the direct effects of interventions on model bias and the indirect effects of interventions on bias mediated through different model components. Our results show that image features are the primary contributors to bias, with significantly higher impacts than text features, specifically accounting for 32.57% and 12.63% of the bias in the MSCOCO and PASCAL-SENTENCE datasets, respectively. Notably, the image encoder's contribution surpasses that of the text encoder and the deep fusion encoder. Further experimentation confirms that contributions from both language and vision modalities are aligned and non-conflicting. Consequently, focusing on blurring gender representations within the image encoder, which contributes most to the model bias, reduces bias efficiently by 22.03% and 9.04% in the MSCOCO and PASCAL-SENTENCE datasets, respectively, with minimal performance loss or increased computational demands.
翻译:在大型数据集上预训练的视觉语言模型(VLM)可能无意中通过将性别信息与特定对象或场景相关联而习得偏见。现有方法主要侧重于修改输入并监测模型输出概率分数的变化,往往难以从模型组件的角度全面理解偏见。我们提出一个结合因果中介分析的框架,用于量化和映射偏见在VLM内部生成与传播的路径。该方法使我们能够识别干预措施对模型偏见的直接影响,以及干预措施通过不同模型组件中介产生的间接影响。我们的研究结果表明,图像特征是偏见的主要贡献者,其影响显著高于文本特征,在MSCOCO和PASCAL-SENTENCE数据集中分别占偏见的32.57%和12.63%。值得注意的是,图像编码器的贡献超过了文本编码器和深度融合编码器。进一步的实验证实,语言和视觉模态的贡献是协同且非冲突的。因此,针对对模型偏见贡献最大的图像编码器中的性别表征进行模糊化处理,可在MSCOCO和PASCAL-SENTENCE数据集中分别高效降低22.03%和9.04%的偏见,且性能损失或计算需求增加极小。