We present a novel vision-language prompt learning approach for few-shot out-of-distribution (OOD) detection. Few-shot OOD detection aims to detect OOD images from classes that are unseen during training using only a few labeled in-distribution (ID) images. While prompt learning methods such as CoOp have shown effectiveness and efficiency in few-shot ID classification, they still face limitations in OOD detection due to the potential presence of ID-irrelevant information in text embeddings. To address this issue, we introduce a new approach called \textbf{Lo}cal regularized \textbf{Co}ntext \textbf{Op}timization (LoCoOp), which performs OOD regularization that utilizes the portions of CLIP local features as OOD features during training. CLIP's local features have a lot of ID-irrelevant nuisances (e.g., backgrounds), and by learning to push them away from the ID class text embeddings, we can remove the nuisances in the ID class text embeddings and enhance the separation between ID and OOD. Experiments on the large-scale ImageNet OOD detection benchmarks demonstrate the superiority of our LoCoOp over zero-shot, fully supervised detection methods and prompt learning methods. Notably, even in a one-shot setting -- just one label per class, LoCoOp outperforms existing zero-shot and fully supervised detection methods. The code will be available via \url{https://github.com/AtsuMiyai/LoCoOp}.
翻译:我们提出了一种新颖的视觉-语言提示学习方法,用于小样本分布外(OOD)检测。小样本OOD检测旨在仅使用少量带标签的分布内(ID)图像,检测训练中未见类别的OOD图像。尽管CoOp等提示学习方法在小样本ID分类中展现出有效性和高效性,但由于文本嵌入中可能包含与ID无关的信息,它们在OOD检测中仍面临局限性。为解决这一问题,我们引入了一种名为**局部正则化上下文优化**(LoCoOp)的新方法,该方法在训练过程中利用CLIP局部特征的部分作为OOD特征进行OOD正则化。CLIP的局部特征包含大量与ID无关的干扰因素(如背景),通过学习将这些干扰因素推离ID类文本嵌入,我们可以消除ID类文本嵌入中的干扰,并增强ID与OOD之间的分离。在大型ImageNet OOD检测基准上的实验表明,我们的LoCoOp在零样本、全监督检测方法及提示学习方法中均具有优越性。值得注意的是,即使在一样本设置(每类仅一个标签)下,LoCoOp的性能也超越了现有的零样本和全监督检测方法。代码将公开于 \url{https://github.com/AtsuMiyai/LoCoOp}。