Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments. For instance, models that associate cats with bed backgrounds can fail to predict the existence of cats in other environments without beds. Mitigating spurious correlations is crucial in building trustworthy models. However, the existing works lack transparency to offer insights into the mitigation process. In this work, we propose an interpretable framework, Discover and Cure (DISC), to tackle the issue. With human-interpretable concepts, DISC iteratively 1) discovers unstable concepts across different environments as spurious attributes, then 2) intervenes on the training data using the discovered concepts to reduce spurious correlation. Across systematic experiments, DISC provides superior generalization ability and interpretability than the existing approaches. Specifically, it outperforms the state-of-the-art methods on an object recognition task and a skin-lesion classification task by 7.5% and 9.6%, respectively. Additionally, we offer theoretical analysis and guarantees to understand the benefits of models trained by DISC. Code and data are available at https://github.com/Wuyxin/DISC.
翻译:深度神经网络常依赖虚假相关性进行预测,这阻碍了其在训练环境之外的泛化能力。例如,将猫与床背景关联的模型,在无床环境中可能无法正确预测猫的存在。缓解虚假相关性对构建可信模型至关重要。然而,现有研究缺乏对缓解过程的透明理解。本文提出可解释框架"发现与治愈"(DISC)以解决该问题。通过人类可解释的概念,DISC迭代进行:1)发现不同环境下的不稳定概念作为虚假属性;2)利用已发现概念干预训练数据以降低虚假相关性。系统实验表明,DISC相比现有方法具有更优的泛化能力和可解释性:在物体识别任务和皮肤病变分类任务中,其性能分别超越当前最优方法7.5%和9.6%。此外,我们提供了理论分析与保证,以阐明DISC训练模型的优势。代码与数据见https://github.com/Wuyxin/DISC。