Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments. For instance, models that associate cats with bed backgrounds can fail to predict the existence of cats in other environments without beds. Mitigating spurious correlations is crucial in building trustworthy models. However, the existing works lack transparency to offer insights into the mitigation process. In this work, we propose an interpretable framework, Discover and Cure (DISC), to tackle the issue. With human-interpretable concepts, DISC iteratively 1) discovers unstable concepts across different environments as spurious attributes, then 2) intervenes on the training data using the discovered concepts to reduce spurious correlation. Across systematic experiments, DISC provides superior generalization ability and interpretability than the existing approaches. Specifically, it outperforms the state-of-the-art methods on an object recognition task and a skin-lesion classification task by 7.5% and 9.6%, respectively. Additionally, we offer theoretical analysis and guarantees to understand the benefits of models trained by DISC. Code and data are available at https://github.com/Wuyxin/DISC.
翻译:深度神经网络常依赖虚假相关性进行预测,这阻碍了其在训练环境之外的泛化能力。例如,将猫与床背景关联的模型,在无床的其他环境中可能无法正确预测猫的存在。缓解虚假相关性对于构建可信模型至关重要。然而,现有工作缺乏透明度,难以揭示缓解过程的内在机理。本文提出一种可解释框架——发现与治疗(DISC),以解决该问题。借助人类可解释的概念,DISC通过迭代方式:1)识别不同环境下的不稳定概念作为虚假特征;2)基于所发现的概念对训练数据进行干预以减少虚假相关性。通过系统性实验,DISC在泛化能力和可解释性上均优于现有方法。具体而言,在物体识别任务和皮肤病变分类任务中,DISC分别以7.5%和9.6%的绝对优势超越当前最优方法。此外,我们提供了理论分析与保证,以理解DISC训练模型性能提升的机理。代码与数据见https://github.com/Wuyxin/DISC。