CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection

This paper considers zero-shot Anomaly Detection (AD), performing AD without reference images of the test objects. We propose a framework called CLIP-AD to leverage the zero-shot capabilities of the large vision-language model CLIP. Firstly, we reinterpret the text prompts design from a distributional perspective and propose a Representative Vector Selection (RVS) paradigm to obtain improved text features. Secondly, we note opposite predictions and irrelevant highlights in the direct computation of the anomaly maps. To address these issues, we introduce a Staged Dual-Path model (SDP) that leverages features from various levels and applies architecture and feature surgery. Lastly, delving deeply into the two phenomena, we point out that the image and text features are not aligned in the joint embedding space. Thus, we introduce a fine-tuning strategy by adding linear layers and construct an extended model SDP+, further enhancing the performance. Abundant experiments demonstrate the effectiveness of our approach, e.g., on MVTec-AD, SDP outperforms the SOTA WinCLIP by +4.2/+10.7 in segmentation metrics F1-max/PRO, while SDP+ achieves +8.3/+20.5 improvements.

翻译：本文研究零样本异常检测（Anomaly Detection, AD）问题，即在无需测试对象参考图像的情况下进行异常检测。我们提出名为CLIP-AD的框架，利用大型视觉语言模型CLIP的零样本能力。首先，我们从分布视角重新诠释文本提示设计，并提出代表性向量选择（RVS）范式以获取更优的文本特征。其次，我们发现在异常图的直接计算中存在预测结果相悖及无关高亮问题。为解决这些问题，我们引入分阶段双路径模型（SDP），该模型利用不同层级的特征并实施架构与特征修正。最后，深入探究上述两种现象后，我们指出现有图像特征与文本特征在联合嵌入空间中未对齐。为此，我们通过添加线性层引入微调策略，构建扩展模型SDP+，进一步提升性能。大量实验证明了方法的有效性：例如在MVTec-AD数据集上，SDP在分割指标F1-max/PRO上分别超越当前最优方法WinCLIP达+4.2/+10.7，而SDP+则实现了+8.3/+20.5的改进。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日