Deep learning models are known to suffer from the problem of bias, and researchers have been exploring methods to address this issue. However, most of these methods require prior knowledge of the bias and are not always practical. In this paper, we focus on a more practical setting with no prior information about the bias. Generally, in this setting, there are a large number of bias-aligned samples that cause the model to produce biased predictions and a few bias-conflicting samples that do not conform to the bias. If the training data is limited, the influence of the bias-aligned samples may become even stronger on the model predictions, and we experimentally demonstrate that existing debiasing techniques suffer severely in such cases. In this paper, we examine the effects of unknown bias in small dataset regimes and present a novel approach to mitigate this issue. The proposed approach directly addresses the issue of the extremely low occurrence of bias-conflicting samples in limited data settings through the synthesis of hybrid samples that can be used to reduce the effect of bias. We perform extensive experiments on several benchmark datasets and experimentally demonstrate the effectiveness of our proposed approach in addressing any unknown bias in the presence of limited data. Specifically, our approach outperforms the vanilla, LfF, LDD, and DebiAN debiasing methods by absolute margins of 10.39%, 9.08%, 8.07%, and 9.67% when only 10% of the Corrupted CIFAR-10 Type 1 dataset is available with a bias-conflicting sample ratio of 0.05.
翻译:深度学习模型存在偏差问题,研究人员一直在探索解决该问题的方法。然而,大多数方法需要预先了解偏差信息,且不总是实用。本文聚焦于更实用的场景——无偏差先验信息。通常在此场景下,存在大量导致模型产生有偏预测的"偏差对齐"样本,以及少量不遵从偏差的"偏差冲突"样本。当训练数据有限时,偏差对齐样本对模型预测的影响可能更强,我们通过实验证明现有去偏技术在此类情况下表现严重受限。本文研究了小数据集场景中未知偏差的影响,并提出了一种缓解该问题的新方法。该方法通过合成混合样本来直接解决有限数据场景中偏差冲突样本出现率极低的问题,从而降低偏差影响。我们在多个基准数据集上进行了大量实验,实验结果表明所提方法在处理有限数据下的任意未知偏差时具有显著效果。具体而言,当Corrupted CIFAR-10 Type 1数据集仅使用10%可用样本且偏差冲突样本比例为0.05时,本方法相较于Vanilla、LfF、LDD和DebiAN去偏方法分别取得了10.39%、9.08%、8.07%和9.67%的绝对性能提升。