Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability. While most existing debiasing methods require full supervision on either spurious attributes or target labels, training a debiased model from a limited amount of both annotations is still an open question. To address this issue, we investigate an interesting phenomenon using the spectral analysis of latent representations: spuriously correlated attributes make neural networks inductively biased towards encoding lower effective rank representations. We also show that a rank regularization can amplify this bias in a way that encourages highly correlated features. Leveraging these findings, we propose a self-supervised debiasing framework potentially compatible with unlabeled samples. Specifically, we first pretrain a biased encoder in a self-supervised manner with the rank regularization, serving as a semantic bottleneck to enforce the encoder to learn the spuriously correlated attributes. This biased encoder is then used to discover and upweight bias-conflicting samples in a downstream task, serving as a boosting to effectively debias the main model. Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines and, in some cases, even outperforms state-of-the-art supervised debiasing approaches.
翻译:虚假相关性会导致深度神经网络产生强偏置,损害泛化能力。现有大多数去偏方法需要对虚假属性或目标标签进行完全监督,而如何从有限标注中训练去偏模型仍是一个悬而未决的问题。针对该问题,我们通过潜在表示的频谱分析发现了一个有趣现象:虚假相关属性会使神经网络在归纳偏置上倾向于编码更低有效秩的表示。同时我们发现,秩正则化能以鼓励高相关特征的方式放大这种偏置。基于这些发现,我们提出了一种与无标注样本潜在兼容的自监督去偏框架。具体而言,我们首先通过秩正则化以自监督方式预训练一个偏置编码器,该编码器作为语义瓶颈迫使模型学习虚假相关属性。随后在下游任务中利用该偏置编码器发现并加权与偏置冲突的样本,通过提升策略有效消除主模型的偏置。值得关注的是,所提去偏框架显著提升了自监督学习基线的泛化性能,在某些情况下甚至超越了当前最先进的监督去偏方法。