CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

Out-of-distribution (OOD) detection refers to training the model on an in-distribution (ID) dataset to classify whether the input images come from unknown classes. Considerable effort has been invested in designing various OOD detection methods based on either convolutional neural networks or transformers. However, zero-shot OOD detection methods driven by CLIP, which only require class names for ID, have received less attention. This paper presents a novel method, namely CLIP saying "no" (\textbf{CLIPN}), which empowers the logic of saying "no" within CLIP. Our key motivation is to equip CLIP with the capability of distinguishing OOD and ID samples using positive-semantic prompts and negation-semantic prompts. Specifically, we design a novel learnable "no" prompt and a "no" text encoder to capture negation semantics within images. Subsequently, we introduce two loss functions: the image-text binary-opposite loss and the text semantic-opposite loss, which we use to teach CLIPN to associate images with "no" prompts, thereby enabling it to identify unknown samples. Furthermore, we propose two threshold-free inference algorithms to perform OOD detection by utilizing negation semantics from "no" prompts and the text encoder. Experimental results on 9 benchmark datasets (3 ID datasets and 6 OOD datasets) for the OOD detection task demonstrate that CLIPN, based on ViT-B-16, outperforms 7 well-used algorithms by at least 2.34\% and 11.64\% in terms of AUROC and FPR95 for zero-shot OOD detection on ImageNet-1K. Our CLIPN can serve as a solid foundation for effectively leveraging CLIP in downstream OOD tasks. The code is available on https://github.com/xmed-lab/CLIPN}{https://github.com/xmed-lab/CLIPN.

翻译：分布外（OOD）检测是指利用内分布（ID）数据集训练模型，以判断输入图像是否来自未知类别。研究者们投入了大量精力，基于卷积神经网络或Transformer设计了多种OOD检测方法。然而，由CLIP驱动的零样本OOD检测方法（仅需ID类别名称）受到的关注较少。本文提出一种新方法，即让CLIP说“不”（\textbf{CLIPN}），该方法赋予了CLIP表达“不”的逻辑能力。我们的核心动机是通过正向语义提示和否定语义提示，使CLIP具备区分OOD样本与ID样本的能力。具体而言，我们设计了一种新颖的可学习“不”提示以及一个“不”文本编码器，用于捕获图像中的否定语义。随后，我们引入了两种损失函数：图像-文本二元对立损失和文本语义对立损失，利用它们训练CLIPN将图像与“不”提示相关联，从而使其能够识别未知样本。此外，我们提出了两种无需阈值的推理算法，通过利用“不”提示和文本编码器中的否定语义来执行OOD检测。在9个基准数据集（3个ID数据集和6个OOD数据集）上的OOD检测任务实验结果表明，基于ViT-B-16的CLIPN在ImageNet-1K的零样本OOD检测中，AUROC和FPR95指标分别以至少2.34%和11.64%的优势优于7种常用算法。我们的CLIPN可为高效利用CLIP处理下游OOD任务提供坚实基础。代码已开源：https://github.com/xmed-lab/CLIPN。