It is widely recognized that deep neural networks are sensitive to bias in the data. This means that during training these models are likely to learn spurious correlations between data and labels, resulting in limited generalization abilities and low performance. In this context, model debiasing approaches can be devised aiming at reducing the model's dependency on such unwanted correlations, either leveraging the knowledge of bias information or not. In this work, we focus on the latter and more realistic scenario, showing the importance of accurately predicting the bias-conflicting and bias-aligned samples to obtain compelling performance in bias mitigation. On this ground, we propose to conceive the problem of model bias from an out-of-distribution perspective, introducing a new bias identification method based on anomaly detection. We claim that when data is mostly biased, bias-conflicting samples can be regarded as outliers with respect to the bias-aligned distribution in the feature space of a biased model, thus allowing for precisely detecting them with an anomaly detection method. Coupling the proposed bias identification approach with bias-conflicting data upsampling and augmentation in a two-step strategy, we reach state-of-the-art performance on synthetic and real benchmark datasets. Ultimately, our proposed approach shows that the data bias issue does not necessarily require complex debiasing methods, given that an accurate bias identification procedure is defined. Source code is available at https://github.com/Malga-Vision/MoDAD
翻译:众所周知,深度神经网络对数据中的偏差十分敏感。这意味着在训练过程中,这些模型很可能学习到数据与标签之间的虚假相关性,导致泛化能力受限且性能低下。在此背景下,模型去偏方法应运而生,其目标在于降低模型对此类不良相关性的依赖,无论是否利用偏差信息的先验知识。本研究聚焦于后者——即更现实的场景,通过实验证明准确预测偏差冲突样本与偏差对齐样本对于获得优异偏差缓解性能的重要性。基于此,我们提出从分布外视角重新审视模型偏差问题,引入一种基于异常检测的新型偏差识别方法。我们主张:当数据存在显著偏差时,在偏置模型的特征空间中,偏差冲突样本可被视为相对于偏差对齐分布的异常值,从而可通过异常检测方法实现精准识别。通过将所提出的偏差识别方法与偏差冲突样本上采样及数据增强技术相结合,构建两步式训练策略,我们在合成与真实基准数据集上均达到了最先进的性能表现。最终,本研究提出的方法表明:只要构建精确的偏差识别流程,数据偏差问题未必需要复杂的去偏方法。源代码发布于 https://github.com/Malga-Vision/MoDAD