We aim to diagnose the potential biases in image classifiers. To this end, prior works manually labeled biased attributes or visualized biased features, which need high annotation costs or are often ambiguous to interpret. Instead, we leverage two types (generative and discriminative) of pre-trained vision-language models to describe the visual bias as a word. Specifically, we propose bias-to-text (B2T), which generates captions of the mispredicted images using a pre-trained captioning model to extract the common keywords that may describe visual biases. Then, we categorize the bias type as spurious correlation or majority bias by checking if it is specific or agnostic to the class, based on the similarity of class-wise mispredicted images and the keyword upon a pre-trained vision-language joint embedding space, e.g., CLIP. We demonstrate that the proposed simple and intuitive scheme can recover well-known gender and background biases, and discover novel ones in real-world datasets. Moreover, we utilize B2T to compare the classifiers using different architectures or training methods. Finally, we show that one can obtain debiased classifiers using the B2T bias keywords and CLIP, in both zero-shot and full-shot manners, without using any human annotation on the bias.
翻译:我们旨在诊断图像分类器中潜在的偏差。为此,现有方法通常需要人工标注有偏属性或可视化有偏特征,这不仅需要高昂的标注成本,而且往往难以清晰解释。相反,我们利用两种类型的预训练视觉语言模型(生成式与判别式)将视觉偏差描述为一个文字。具体而言,我们提出“偏差到文本”(B2T)方法,该方法使用预训练的标题生成模型为误分类图像生成描述,并提取可能描述视觉偏差的通用关键词。随后,我们通过基于预训练的视觉语言联合嵌入空间(如CLIP)计算类别级误分类图像与关键词之间的相似性,将偏差类型区分为虚假相关偏差或多数类别偏差,具体取决于该偏差是否与类别特定相关或无关。实验表明,我们提出的简单直观方案能够有效恢复知名的性别与背景偏差,并在真实数据集中发现新型偏差。此外,我们利用B2T比较了采用不同架构或训练方法的分类器。最后,我们证明在不依赖任何人工偏差标注的情况下,通过B2T偏差关键词与CLIP,能够在零样本与全样本两种模式下获得去偏分类器。