Machine learning model bias can arise from dataset composition: sensitive features correlated to the learning target disturb the model decision rule and lead to performance differences along the features. Existing de-biasing work captures prominent and delicate image features which are traceable in model latent space, like colors of digits or background of animals. However, using the latent space is not sufficient to understand all dataset feature correlations. In this work, we propose a framework to extract feature clusters in a dataset based on image descriptions, allowing us to capture both subtle and coarse features of the images. The feature co-occurrence pattern is formulated and correlation is measured, utilizing a human-in-the-loop for examination. The analyzed features and correlations are human-interpretable, so we name the method Common-Sense Bias Discovery (CSBD). Having exposed sensitive correlations in a dataset, we demonstrate that downstream model bias can be mitigated by adjusting image sampling weights, without requiring a sensitive group label supervision. Experiments show that our method discovers novel biases on multiple classification tasks for two benchmark image datasets, and the intervention outperforms state-of-the-art unsupervised bias mitigation methods.
翻译:机器学习模型偏差可能源自数据集构成:与学习目标相关的敏感特征干扰模型决策规则,导致沿不同特征方向产生性能差异。现有去偏工作主要捕捉模型隐空间中可追溯的显著及精细图像特征(如数字颜色或动物背景)。然而,仅依赖隐空间不足以理解所有数据集特征相关性。本文提出一种基于图像描述提取数据集中特征簇的框架,既能捕获图像的细微特征也能获取粗粒度特征。我们通过纳入人工审查环节,对特征共现模式进行建模并度量相关性。由于所分析的特征及关联具有人类可解释性,故将该方法命名为"常识性偏差发现"(Common-Sense Bias Discovery, CSBD)。在揭示数据集中敏感相关性后,我们证明可通过调整图像采样权重来缓解下游模型偏差,且无需敏感组标签监督。实验表明,该方法能在两个基准图像数据集的多个分类任务中发现新型偏差,其干预效果优于当前最先进的无监督偏差缓解方法。