Machine learning model bias can arise from dataset composition: sensitive features correlated to the learning target disturb the model decision rule and lead to performance differences along the features. Existing de-biasing work captures prominent and delicate image features which are traceable in model latent space, like colors of digits or background of animals. However, using the latent space is not sufficient to understand all dataset feature correlations. In this work, we propose a framework to extract feature clusters in a dataset based on image descriptions, allowing us to capture both subtle and coarse features of the images. The feature co-occurrence pattern is formulated and correlation is measured, utilizing a human-in-the-loop for examination. The analyzed features and correlations are human-interpretable, so we name the method Common-Sense Bias Discovery (CSBD). Having exposed sensitive correlations in a dataset, we demonstrate that downstream model bias can be mitigated by adjusting image sampling weights, without requiring a sensitive group label supervision. Experiments show that our method discovers novel biases on multiple classification tasks for two benchmark image datasets, and the intervention outperforms state-of-the-art unsupervised bias mitigation methods.
翻译:机器学习模型偏见可能源于数据集构成:与学习目标相关的敏感特征干扰模型决策规则,导致沿特征维度的性能差异。现有去偏见工作捕捉模型潜在空间中可追溯的显著及细微图像特征,如数字颜色或动物背景。然而,仅依靠潜在空间不足以理解所有数据集特征相关性。本文提出一个基于图像描述提取数据集中特征簇的框架,能同时捕捉图像的细微与粗略特征。我们通过引入人工反馈进行检验,建立特征共现模式并衡量相关性。由于所分析的特征和相关性具有人类可解释性,我们将该方法命名为常识偏见发现(CSBD)。在揭示数据集中敏感关联后,我们证明通过调整图像采样权重即可缓解下游模型偏见,无需敏感组标签监督。实验表明,该方法在两个基准图像数据集的多个分类任务中发现了新型偏见,且干预效果优于现有无监督偏见缓解方法。