Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations, which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method ensuring the right reasons on the concept level by reducing the model's sensitivity towards biases through the gradient. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures.
翻译:深度神经网络容易学习训练数据中嵌入的虚假相关性,从而导致潜在的有偏预测。当将这些模型部署于医疗等高风险决策场景时,这带来了风险。当前的模型事后修正方法要么需要输入级标注(仅适用于空间局部化偏见),要么通过增强潜在特征空间来寄希望于强制正确理由。我们提出了一种新颖方法,通过梯度降低模型对偏见的敏感性,在概念层面确保正确理由。当通过概念激活向量对偏见进行建模时,我们强调选择稳健方向的重要性,因为传统的基于回归的方法(如支持向量机)往往会导致发散的方向。我们使用VGG、ResNet和EfficientNet架构,在ISIC、骨龄、ImageNet和CelebA数据集上的受控和现实场景中有效缓解了偏见。