Neural networks are prone to be biased towards spurious correlations between classes and latent attributes exhibited in a major portion of training data, which ruins their generalization capability. We propose a new method for training debiased classifiers with no spurious attribute label. The key idea is to employ a committee of classifiers as an auxiliary module that identifies bias-conflicting data, i.e., data without spurious correlation, and assigns large weights to them when training the main classifier. The committee is learned as a bootstrapped ensemble so that a majority of its classifiers are biased as well as being diverse, and intentionally fail to predict classes of bias-conflicting data accordingly. The consensus within the committee on prediction difficulty thus provides a reliable cue for identifying and weighting bias-conflicting data. Moreover, the committee is also trained with knowledge transferred from the main classifier so that it gradually becomes debiased along with the main classifier and emphasizes more difficult data as training progresses. On five real-world datasets, our method outperforms prior arts using no spurious attribute label like ours and even surpasses those relying on bias labels occasionally.
翻译:神经网络容易在训练数据中主要部分的类与潜在属性之间产生虚假相关性而偏向,从而损害其泛化能力。我们提出了一种无需虚假属性标签即可训练去偏分类器的新方法。核心思想是采用一个分类器委员会作为辅助模块来识别与偏置冲突的数据(即无虚假相关性的数据),并在训练主分类器时为其分配较大的权重。该委员会通过自助聚合集成学习,使其大多数分类器既存在偏置又具有多样性,从而有意识地无法预测与偏置冲突数据的类别。因此,委员会在预测难度上达成的一致意见为识别和加权与偏置冲突的数据提供了可靠线索。此外,委员会还通过从主分类器转移知识进行训练,使其随主分类器逐步去偏,并随着训练进程更关注困难数据。在五个真实世界数据集上,我们的方法优于像我们这样不使用虚假属性标签的现有技术,甚至偶尔超过依赖偏置标签的方法。