Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method for model correction on the concept level that explicitly reduces model sensitivity towards biases via gradient penalization. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures. Code is available on https://github.com/frederikpahde/rrclarc.
翻译:深度神经网络极易学习训练数据中嵌入的虚假关联,从而导致预测结果可能产生偏差。当将这些模型部署于高风险决策场景(如医疗应用)时,这一特性会带来显著风险。现有的事后模型校正方法要么需要输入级标注(仅适用于空间局部偏差),要么通过扩充潜在特征空间来寄希望于强制模型学习正确推理依据。我们提出了一种新颖的概念级模型校正方法,通过梯度惩罚显式降低模型对偏差的敏感度。在利用概念激活向量建模偏差时,我们强调了选择稳健方向的重要性,因为传统基于回归的方法(如支持向量机)往往会产生发散性方向。我们在ISIC、骨龄、ImageNet和CelebA数据集上,使用VGG、ResNet和EfficientNet架构,有效缓解了受控环境与真实场景中的偏差。代码已开源至https://github.com/frederikpahde/rrclarc。