The problem of spurious correlations (SCs) arises when a classifier relies on non-predictive features that happen to be correlated with the labels in the training data. For example, a classifier may misclassify dog breeds based on the background of dog images. This happens when the backgrounds are correlated with other breeds in the training data, leading to misclassifications during test time. Previous SC benchmark datasets suffer from varying issues, e.g., over-saturation or only containing one-to-one (O2O) SCs, but no many-to-many (M2M) SCs arising between groups of spurious attributes and classes. In this paper, we present Spawrious-{O2O, M2M}-{Easy, Medium, Hard}, an image classification benchmark suite containing spurious correlations among different dog breeds and background locations. To create this dataset, we employ a text-to-image model to generate photo-realistic images, and an image captioning model to filter out unsuitable ones. The resulting dataset is of high quality, containing approximately 152,000 images. Our experimental results demonstrate that state-of-the-art group robustness methods struggle with Spawrious, most notably on the Hard-splits with $<60\%$ accuracy. By examining model misclassifications, we detect reliances on spurious backgrounds, demonstrating that our dataset provides a significant challenge to drive future research.
翻译:虚假相关(SC)问题出现在分类器依赖训练数据中与标签偶然相关的非预测性特征时。例如,分类器可能根据狗图像的背景来错误分类犬种。当背景在训练数据中与其他犬种存在相关性时,就会导致测试期间的分类错误。先前的SC基准数据集存在不同的问题,例如过度饱和或仅包含一对一(O2O)SC,而缺乏虚假属性组与类别组之间出现的多对多(M2M)SC。本文提出了Spawrious-{O2O,M2M}-{简易,中等,困难},一个包含不同犬种与背景位置间虚假相关性的图像分类基准测试套件。为创建该数据集,我们使用文本到图像模型生成逼真图像,并通过图像描述模型滤除不合格样本。最终数据集质量较高,包含约152,000张图像。实验结果表明,最先进的群体鲁棒性方法在Spawrious上表现不佳,尤其是在Hard分割上准确率低于60%。通过分析模型误分类,我们检测到模型对虚假背景的依赖,证明该数据集为驱动未来研究提供了重要挑战。