Adapting Contrastive Language-Image Pretrained (CLIP) Models for Out-of-Distribution Detection

We present a comprehensive experimental study on pretrained feature extractors for visual out-of-distribution (OOD) detection, focusing on adapting contrastive language-image pretrained (CLIP) models. Without fine-tuning on the training data, we are able to establish a positive correlation ($R^2\geq0.92$) between in-distribution classification and unsupervised OOD detection for CLIP models in $4$ benchmarks. We further propose a new simple and scalable method called \textit{pseudo-label probing} (PLP) that adapts vision-language models for OOD detection. Given a set of label names of the training set, PLP trains a linear layer using the pseudo-labels derived from the text encoder of CLIP. To test the OOD detection robustness of pretrained models, we develop a novel feature-based adversarial OOD data manipulation approach to create adversarial samples. Intriguingly, we show that (i) PLP outperforms the previous state-of-the-art \citep{ming2022mcm} on all $5$ large-scale benchmarks based on ImageNet, specifically by an average AUROC gain of 3.4\% using the largest CLIP model (ViT-G), (ii) we show that linear probing outperforms fine-tuning by large margins for CLIP architectures (i.e. CLIP ViT-H achieves a mean gain of 7.3\% AUROC on average on all ImageNet-based benchmarks), and (iii) billion-parameter CLIP models still fail at detecting adversarially manipulated OOD images. The code and adversarially created datasets will be made publicly available.

翻译：我们针对视觉分布外（OOD）检测中的预训练特征提取器进行了全面的实验研究，重点聚焦于适配对比语言-图像预训练（CLIP）模型。在无需对训练数据进行微调的情况下，我们成功在4个基准测试中建立了CLIP模型在分布内分类与无监督OOD检测之间的正相关性（$R^2\geq0.92$）。进一步，我们提出了一种名为“伪标签探测”（PLP）的新型简单可扩展方法，将视觉-语言模型适配至OOD检测。给定训练集的一组标签名称，PLP利用CLIP文本编码器生成的伪标签训练一个线性层。为测试预训练模型的OOD检测鲁棒性，我们开发了一种新颖的基于特征的对抗性OOD数据操控方法以生成对抗样本。引人注目的是，我们证明：（i）在基于ImageNet的所有5个大规模基准测试中，PLP均优于先前最先进方法\citep{ming2022mcm}，具体而言，使用最大CLIP模型（ViT-G）平均AUROC提升了3.4%；（ii）对于CLIP架构，线性探测在多数情况下大幅优于微调（例如CLIP ViT-H在所有基于ImageNet的基准测试中平均AUROC提升达7.3%）；（iii）十亿参数级别的CLIP模型仍无法有效检测经过对抗性操控的OOD图像。相关代码与生成的对抗性数据集将公开发布。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日