Deep neural networks often struggle to learn robust representations in the presence of dataset biases, leading to suboptimal generalization on unbiased datasets. This limitation arises because the models heavily depend on peripheral and confounding factors, inadvertently acquired during training. Existing approaches to address this problem typically involve explicit supervision of bias attributes or reliance on prior knowledge about the biases. In this study, we address the challenging scenario where no explicit annotations of bias are available, and there's no prior knowledge about its nature. We present a fully unsupervised debiasing framework with three key steps: firstly, leveraging the inherent tendency to learn malignant biases to acquire a bias-capturing model; next, employing a pseudo-labeling process to obtain bias labels; and finally, applying cutting-edge supervised debiasing techniques to achieve an unbiased model. Additionally, we introduce a theoretical framework for evaluating model biasedness and conduct a detailed analysis of how biases impact neural network training. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of our method, showcasing state-of-the-art performance in various settings, occasionally surpassing fully supervised debiasing approaches.
翻译:深度神经网络在存在数据集偏差时,常常难以学习到鲁棒的表征,导致在无偏数据集上泛化性能不佳。这一局限性源于模型在训练过程中无意习得了对外围和混杂因素的过度依赖。现有解决该问题的方法通常需要对偏差属性进行显式监督,或依赖关于偏差的先验知识。在本研究中,我们针对更具挑战性的场景展开研究:既无可用的偏差显式标注,亦无关于其性质的先验知识。我们提出一个完全无监督的去偏框架,包含三个关键步骤:首先,利用模型学习恶性偏差的内在倾向,获取一个偏差捕捉模型;其次,采用伪标注过程获取偏差标签;最后,应用前沿的监督去偏技术以获得无偏模型。此外,我们引入了一个评估模型偏差程度的理论框架,并对偏差如何影响神经网络训练进行了详细分析。在合成数据集和真实数据集上的实验结果证明了我们方法的有效性,在各种设定下均展现出最先进的性能,偶尔甚至超越完全监督的去偏方法。