Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists. This is validated by observing intuitive trends in a face-attribute face-attribute and waterbird classifiers, as well as by fabricating spurious correlations and detecting their presence, both visually and quantitatively. Furthermore, utilizing the CF alignment method, we demonstrate that we can evaluate robust optimization methods (GroupDRO, JTT, and FLAC) by detecting a reduction in spurious correlations.
翻译:由伪相关驱动的模型通常表现出较差的泛化性能。本文提出反事实对齐方法,用于检测和量化黑盒分类器的伪相关性。该方法基于针对某一分类器生成的反事实图像,将其输入其他分类器以观察是否同样引发这些分类器输出的变化。这些响应之间的关系可被量化,并用于识别存在伪相关的具体实例。该方法通过在面部属性分类器与水鸟分类器中观察直观趋势,以及通过构建伪相关并对其进行视觉与定量检测得到验证。此外,利用反事实对齐方法,我们证明能够通过检测伪相关的减少来评估鲁棒优化方法(GroupDRO、JTT与FLAC)的有效性。