In this technical report, we briefly introduce our solution for the Zero/Few-shot Track of the Visual Anomaly and Novelty Detection (VAND) 2023 Challenge. For industrial visual inspection, building a single model that can be rapidly adapted to numerous categories without or with only a few normal reference images is a promising research direction. This is primarily because of the vast variety of the product types. For the zero-shot track, we propose a solution based on the CLIP model by adding extra linear layers. These layers are used to map the image features to the joint embedding space, so that they can compare with the text features to generate the anomaly maps. Besides, when the reference images are available, we utilize multiple memory banks to store their features and compare them with the features of the test images during the testing phase. In this challenge, our method achieved first place in the zero-shot track, especially excelling in segmentation with an impressive F1 score improvement of 0.0489 over the second-ranked participant. Furthermore, in the few-shot track, we secured the fourth position overall, with our classification F1 score of 0.8687 ranking first among all participating teams.
翻译:本技术报告简要介绍了我们在视觉异常与新颖性检测(VAND)2023挑战赛零样本/少样本赛道中的解决方案。对于工业视觉检测而言,构建一个能够快速适配大量类别(无需或仅需少量正常参考图像)的单一模型是一个具有前景的研究方向,这主要源于产品类型的多样性。针对零样本赛道,我们提出了一种基于CLIP模型的解决方案,通过添加额外的线性层将图像特征映射至联合嵌入空间,从而与文本特征进行对比以生成异常图。此外,当存在参考图像时,我们利用多个记忆库存储其特征,并在测试阶段与测试图像特征进行比对。本次挑战赛中,我们的方法在零样本赛道荣获第一名,尤其在分割任务上表现卓越,F1分数较第二名提升0.0489。在少样本赛道中,我们总排名第四,其中分类F1分数达0.8687,位列所有参赛团队之首。