CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

Out-of-distribution (OOD) detection refers to training the model on an in-distribution (ID) dataset to classify whether the input images come from unknown classes. Considerable effort has been invested in designing various OOD detection methods based on either convolutional neural networks or transformers. However, zero-shot OOD detection methods driven by CLIP, which only require class names for ID, have received less attention. This paper presents a novel method, namely CLIP saying no (CLIPN), which empowers the logic of saying no within CLIP. Our key motivation is to equip CLIP with the capability of distinguishing OOD and ID samples using positive-semantic prompts and negation-semantic prompts. Specifically, we design a novel learnable no prompt and a no text encoder to capture negation semantics within images. Subsequently, we introduce two loss functions: the image-text binary-opposite loss and the text semantic-opposite loss, which we use to teach CLIPN to associate images with no prompts, thereby enabling it to identify unknown samples. Furthermore, we propose two threshold-free inference algorithms to perform OOD detection by utilizing negation semantics from no prompts and the text encoder. Experimental results on 9 benchmark datasets (3 ID datasets and 6 OOD datasets) for the OOD detection task demonstrate that CLIPN, based on ViT-B-16, outperforms 7 well-used algorithms by at least 2.34% and 11.64% in terms of AUROC and FPR95 for zero-shot OOD detection on ImageNet-1K. Our CLIPN can serve as a solid foundation for effectively leveraging CLIP in downstream OOD tasks. The code is available on https://github.com/xmed-lab/CLIPN.

翻译：分布外（OOD）检测是指基于分布内（ID）数据集训练模型，以判别输入图像是否来自未知类别。已有大量研究基于卷积神经网络或Transformer设计了各类OOD检测方法。然而，由CLIP驱动的零样本OOD检测方法（仅需ID类别名称）尚较少受到关注。本文提出一种名为“CLIP说‘不’”（CLIPN）的新方法，该方法在CLIP中赋予了拒绝逻辑。我们的核心动机是为CLIP配备利用正向语义提示和否定语义提示区分OOD与ID样本的能力。具体而言，我们设计了一种新颖的可学习否定提示（no prompt）和一个否定文本编码器，用于捕获图像中的否定语义。随后，我们引入两种损失函数：图像-文本二值对立损失和文本语义对立损失，用以训练CLIPN将图像与否定提示关联，从而使其能够识别未知样本。此外，我们提出两种无阈值的推理算法，通过利用否定提示与文本编码器的否定语义执行OOD检测。在OOD检测任务的9个基准数据集（3个ID数据集与6个OOD数据集）上的实验结果表明，基于ViT-B-16的CLIPN在ImageNet-1K的零样本OOD检测中，AUROC指标超越7种主流算法至少2.34%，FPR95指标至少提升11.64%。我们的CLIPN可为在下游OOD任务中有效利用CLIP奠定坚实基础。代码已开源：https://github.com/xmed-lab/CLIPN。