As computer vision systems become more widely deployed, there is increasing concern from both the research community and the public that these systems are not only reproducing but amplifying harmful social biases. The phenomenon of bias amplification, which is the focus of this work, refers to models amplifying inherent training set biases at test time. Existing metrics measure bias amplification with respect to single annotated attributes (e.g., $\texttt{computer}$). However, several visual datasets consist of images with multiple attribute annotations. We show models can learn to exploit correlations with respect to multiple attributes (e.g., {$\texttt{computer}$, $\texttt{keyboard}$}), which are not accounted for by current metrics. In addition, we show current metrics can give the erroneous impression that minimal or no bias amplification has occurred as they involve aggregating over positive and negative values. Further, these metrics lack a clear desired value, making them difficult to interpret. To address these shortcomings, we propose a new metric: Multi-Attribute Bias Amplification. We validate our proposed metric through an analysis of gender bias amplification on the COCO and imSitu datasets. Finally, we benchmark bias mitigation methods using our proposed metric, suggesting possible avenues for future bias mitigation
翻译:随着计算机视觉系统被广泛应用,研究界和公众日益担忧这些系统不仅再现了有害的社会偏见,还在放大它们。本文聚焦于偏差放大现象,即模型在测试时放大了训练集固有的偏差。现有指标仅针对单一标注属性(例如$\texttt{computer}$)衡量偏差放大。然而,许多视觉数据集包含具有多个属性标注的图像。我们展示模型能够学习利用与多个属性(例如{$\texttt{computer}$, $\texttt{keyboard}$})的关联,而当前指标未考虑这一点。此外,当前指标因汇总正值和负值,可能错误地给出最小偏差放大或未发生偏差放大的印象。而且,这些指标缺乏明确的理想值,难以解读。为解决这些缺陷,我们提出新指标:多属性偏差放大。我们通过对COCO和imSitu数据集上的性别偏差放大分析来验证所提指标。最后,我们利用所提指标对偏差缓解方法进行基准测试,为未来的偏差缓解研究提供可能方向。