Neural networks are often biased to spuriously correlated features that provide misleading statistical evidence that does not generalize. This raises an interesting question: ``Does an optimal unbiased functional subnetwork exist in a severely biased network? If so, how to extract such subnetwork?" While empirical evidence has been accumulated about the existence of such unbiased subnetworks, these observations are mainly based on the guidance of ground-truth unbiased samples. Thus, it is unexplored how to discover the optimal subnetworks with biased training datasets in practice. To address this, here we first present our theoretical insight that alerts potential limitations of existing algorithms in exploring unbiased subnetworks in the presence of strong spurious correlations. We then further elucidate the importance of bias-conflicting samples on structure learning. Motivated by these observations, we propose a Debiased Contrastive Weight Pruning (DCWP) algorithm, which probes unbiased subnetworks without expensive group annotations. Experimental results demonstrate that our approach significantly outperforms state-of-the-art debiasing methods despite its considerable reduction in the number of parameters.
翻译:神经网络常受虚假相关特征误导,这些特征提供的统计证据具有误导性且无法泛化。这引出一个有趣的问题:"严重偏置的网络中是否存在最优的无偏功能子网络?若存在,如何提取这样的子网络?"尽管已有实证证据积累表明此类无偏子网络的存在,但这些观察主要基于真实无偏样本的引导。因此,在实践场景中如何利用有偏训练数据集发现最优子网络仍属未解难题。针对此问题,我们首先提出理论洞见,揭示现有算法在强虚假相关条件下探索无偏子网络时存在的潜在局限性。继而进一步阐明冲突偏置样本对结构学习的重要性。受上述观察启发,我们提出去偏对比权重剪枝(DCWP)算法,该算法无需昂贵的群体标注即可探查无偏子网络。实验结果表明,尽管参数数量大幅减少,我们的方法仍显著优于现有最优去偏方法。