We propose a method for generating spurious features by leveraging large-scale text-to-image diffusion models. Although the previous work detects spurious features in a large-scale dataset like ImageNet and introduces Spurious ImageNet, we found that not all spurious images are spurious across different classifiers. Although spurious images help measure the reliance of a classifier, filtering many images from the Internet to find more spurious features is time-consuming. To this end, we utilize an existing approach of personalizing large-scale text-to-image diffusion models with available discovered spurious images and propose a new spurious feature similarity loss based on neural features of an adversarially robust model. Precisely, we fine-tune Stable Diffusion with several reference images from Spurious ImageNet with a modified objective incorporating the proposed spurious-feature similarity loss. Experiment results show that our method can generate spurious images that are consistently spurious across different classifiers. Moreover, the generated spurious images are visually similar to reference images from Spurious ImageNet.
翻译:我们提出了一种利用大规模文本到图像扩散模型生成虚假特征的方法。尽管先前的研究在ImageNet等大规模数据集中检测到了虚假特征,并引入了Spurious ImageNet,但我们发现并非所有虚假图像在不同分类器下都具有虚假性。尽管虚假图像有助于衡量分类器的依赖程度,但从互联网中筛选大量图像以发现更多虚假特征却十分耗时。为此,我们利用现有方法对大规模文本到图像扩散模型进行个性化定制,结合已发现的虚假图像,并提出了一种基于对抗鲁棒模型神经特征的新型虚假特征相似性损失。具体而言,我们使用Spurious ImageNet中的若干参考图像对Stable Diffusion进行微调,其目标函数中融入了所提出的虚假特征相似性损失。实验结果表明,我们的方法能够生成在不同分类器下均具有一致虚假性的图像。此外,生成的虚假图像在视觉上与Spurious ImageNet中的参考图像高度相似。