Deep learning models are known to suffer from the problem of bias, and researchers have been exploring methods to address this issue. However, most of these methods require prior knowledge of the bias and are not always practical. In this paper, we focus on a more practical setting with no prior information about the bias. Generally, in this setting, there are a large number of bias-aligned samples that cause the model to produce biased predictions and a few bias-conflicting samples that do not conform to the bias. If the training data is limited, the influence of the bias-aligned samples may become even stronger on the model predictions, and we experimentally demonstrate that existing debiasing techniques suffer severely in such cases. In this paper, we examine the effects of unknown bias in small dataset regimes and present a novel approach to mitigate this issue. The proposed approach directly addresses the issue of the extremely low occurrence of bias-conflicting samples in limited data settings through the synthesis of hybrid samples that can be used to reduce the effect of bias. We perform extensive experiments on several benchmark datasets and experimentally demonstrate the effectiveness of our proposed approach in addressing any unknown bias in the presence of limited data. Specifically, our approach outperforms the vanilla, LfF, LDD, and DebiAN debiasing methods by absolute margins of 10.39%, 9.08%, 8.07%, and 9.67% when only 10% of the Corrupted CIFAR-10 Type 1 dataset is available with a bias-conflicting sample ratio of 0.05.
翻译:深度学习模型存在偏差问题,研究人员一直在探索缓解该问题的方法。然而,现有方法大多需要先验偏差知识,缺乏实用性。本文聚焦于更实际的场景:无偏差先验信息。在该场景下,通常存在大量导致模型产生偏差预测的偏差对齐样本,以及少量不符合偏差的偏差冲突样本。当训练数据有限时,偏差对齐样本对模型预测的影响可能进一步加剧。我们通过实验证明,现有去偏技术在此类情况下性能显著下降。本文研究了小数据集场景中未知偏差的影响,并提出了一种缓解该问题的新方法。该方法通过合成混合样本直接解决有限数据场景中偏差冲突样本出现频率极低的问题,从而降低偏差效应。我们在多个基准数据集上进行了广泛实验,实验结果表明所提方法能有效应对有限数据下的未知偏差问题。具体而言,在Corrupted CIFAR-10 Type 1数据集仅使用10%数据且偏差冲突样本比例为0.05的条件下,本方法相较Vanilla、LfF、LDD和DebiAN去偏方法分别取得了10.39%、9.08%、8.07%和9.67%的绝对性能提升。