Simplicity bias is the concerning tendency of deep networks to over-depend on simple, weakly predictive features, to the exclusion of stronger, more complex features. This causes biased, incorrect model predictions in many real-world applications, exacerbated by incomplete training data containing spurious feature-label correlations. We propose a direct, interventional method for addressing simplicity bias in DNNs, which we call the feature sieve. We aim to automatically identify and suppress easily-computable spurious features in lower layers of the network, thereby allowing the higher network levels to extract and utilize richer, more meaningful representations. We provide concrete evidence of this differential suppression & enhancement of relevant features on both controlled datasets and real-world images, and report substantial gains on many real-world debiasing benchmarks (11.4% relative gain on Imagenet-A; 3.2% on BAR, etc). Crucially, we outperform many baselines that incorporate knowledge about known spurious or biased attributes, despite our method not using any such information. We believe that our feature sieve work opens up exciting new research directions in automated adversarial feature extraction & representation learning for deep networks.
翻译:简单性偏差是深度网络倾向于过度依赖简单、弱预测特征,而忽略更强、更复杂特征的令人担忧的倾向。这导致在许多实际应用中模型产生有偏见且错误的预测,而包含虚假特征-标签关联的不完整训练数据更是加剧了这一问题。我们提出了一种直接干预方法来解决深度神经网络中的简单性偏差问题,称为特征筛。我们的目标是自动识别并抑制网络较低层中易于计算的虚假特征,从而使更高层能够提取并利用更丰富、更有意义的表示。我们在受控数据集和真实世界图像上提供了这种相关特征差异化抑制与增强的具体证据,并在许多真实世界的去偏基准测试中报告了显著的性能提升(在ImageNet-A上相对提升11.4%,在BAR上提升3.2%等)。关键在于,我们的方法在性能上优于许多利用已知虚假或偏见属性的基线方法,尽管我们的方法并未使用任何此类信息。我们相信,特征筛工作为深度网络的自动对抗特征提取与表示学习开辟了令人兴奋的新研究方向。