Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations, which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method ensuring the right reasons on the concept level by reducing the model's sensitivity towards biases through the gradient. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures.
翻译:深度神经网络容易学习训练数据中嵌入的虚假关联,导致潜在的偏见预测。这在将这些模型部署于高风险决策(如医疗应用)时带来风险。当前的后验模型校正方法要么需要输入级标注(仅适用于空间局部化的偏见),要么通过增强潜在特征空间来试图强制执行正确理由。我们提出了一种新颖方法,通过梯度降低模型对偏见的敏感度,从而在概念级别确保正确理由。当通过概念激活向量对偏见进行建模时,我们强调了选择稳健方向的重要性,因为传统基于回归的方法(如支持向量机)往往会导致发散的方向。我们使用VGG、ResNet和EfficientNet架构,在ISIC、骨龄、ImageNet和CelebA数据集上的受控和真实场景中有效缓解了偏见。