Large-scale vision-language models, such as CLIP, are known to contain harmful societal bias regarding protected attributes (e.g., gender and age). In this paper, we aim to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical limitations: 1) loss of attribute information when it is explicitly disclosed in the input and 2) use of the attribute annotations during debiasing process. To mitigate societal bias in CLIP and overcome these limitations simultaneously, we introduce a simple-yet-effective debiasing method called SANER (societal attribute neutralizer) that eliminates attribute information from CLIP text features only of attribute-neutral descriptions. Experimental results show that SANER, which does not require attribute annotations and preserves original information for attribute-specific descriptions, demonstrates superior debiasing ability than the existing methods.
翻译:大规模视觉语言模型(如CLIP)已知存在涉及受保护属性(例如性别和年龄)的有害社会偏见。本文旨在解决CLIP中的社会偏见问题。尽管先前研究已提出通过对抗学习或测试时投影来消除社会偏见,但我们对这些工作的全面分析揭示了两个关键局限:1)当输入中明确包含属性信息时,会导致属性信息丢失;2)去偏过程中需依赖属性标注。为同时缓解CLIP的社会偏见并克服这些局限,我们提出了一种简单而有效的去偏方法SANER(社会属性中和器),该方法仅从属性中性描述的CLIP文本特征中消除属性信息。实验结果表明,无需属性标注且能保留属性特定描述原始信息的SANER,展现出优于现有方法的去偏能力。