Mitigating bias in machine learning models is a critical endeavor for ensuring fairness and equity. In this paper, we propose a novel approach to address bias by leveraging pixel image attributions to identify and regularize regions of images containing significant information about bias attributes. Our method utilizes a model-agnostic approach to extract pixel attributions by employing a convolutional neural network (CNN) classifier trained on small image patches. By training the classifier to predict a property of the entire image using only a single patch, we achieve region-based attributions that provide insights into the distribution of important information across the image. We propose utilizing these attributions to introduce targeted noise into datasets with confounding attributes that bias the data, thereby constraining neural networks from learning these biases and emphasizing the primary attributes. Our approach demonstrates its efficacy in enabling the training of unbiased classifiers on heavily biased datasets.
翻译:缓解机器学习模型中的偏差是确保公平性与平等性的关键挑战。本文提出一种新方法,通过利用像素级图像归因识别并正则化包含显著偏差属性信息的图像区域。该方法采用模型无关的方式提取像素归因,具体通过训练基于小图像块的卷积神经网络分类器实现。通过仅用单块图像块训练分类器预测完整图像的属性,我们获得了提供图像重要信息分布洞察的区域级归因。我们提出利用这些归因向具有混淆属性的数据集引入定向噪声——这些混淆属性会扭曲数据,从而限制神经网络学习此类偏差并强化主要属性。实验表明,该方法能在高度偏差的数据集上有效训练无偏分类器。