Image captioning models are known to perpetuate and amplify harmful societal bias in the training set. In this work, we aim to mitigate such gender bias in image captioning models. While prior work has addressed this problem by forcing models to focus on people to reduce gender misclassification, it conversely generates gender-stereotypical words at the expense of predicting the correct gender. From this observation, we hypothesize that there are two types of gender bias affecting image captioning models: 1) bias that exploits context to predict gender, and 2) bias in the probability of generating certain (often stereotypical) words because of gender. To mitigate both types of gender biases, we propose a framework, called LIBRA, that learns from synthetically biased samples to decrease both types of biases, correcting gender misclassification and changing gender-stereotypical words to more neutral ones. Code is available at https://github.com/rebnej/LIBRA.
翻译:图像描述模型已知会延续并放大训练集中有害的社会偏见。本研究旨在缓解图像描述模型中的此类性别偏见。先前的工作通过强制模型聚焦于人物以减少性别误分类,但该方式在预测正确性别时反而生成了性别刻板印象词汇。基于此观察,我们假设存在两种影响图像描述模型的性别偏见:1) 利用上下文预测性别的偏见,2) 因性别差异导致特定(常为刻板印象)词汇生成概率的偏见。为缓解这两种性别偏见,我们提出名为LIBRA的框架,通过从合成偏见样本中学习,同时降低两类偏见,修正性别误分类并将性别刻板印象词汇转换为更中性的表述。代码已发布于https://github.com/rebnej/LIBRA。