Few-shot anomaly detection methods can effectively address data collecting difficulty in industrial scenarios. Compared to 2D few-shot anomaly detection (2D-FSAD), 3D few-shot anomaly detection (3D-FSAD) is still an unexplored but essential task. In this paper, we propose CLIP3D-AD, an efficient 3D-FSAD method extended on CLIP. We successfully transfer strong generalization ability of CLIP into 3D-FSAD. Specifically, we synthesize anomalous images on given normal images as sample pairs to adapt CLIP for 3D anomaly classification and segmentation. For classification, we introduce an image adapter and a text adapter to fine-tune global visual features and text features. Meanwhile, we propose a coarse-to-fine decoder to fuse and facilitate intermediate multi-layer visual representations of CLIP. To benefit from geometry information of point cloud and eliminate modality and data discrepancy when processed by CLIP, we project and render point cloud to multi-view normal and anomalous images. Then we design multi-view fusion module to fuse features of multi-view images extracted by CLIP which are used to facilitate visual representations for further enhancing vision-language correlation. Extensive experiments demonstrate that our method has a competitive performance of 3D few-shot anomaly classification and segmentation on MVTec-3D AD dataset.
翻译:小样本异常检测方法能有效应对工业场景中数据收集困难的问题。与二维小样本异常检测相比,三维小样本异常检测仍是一个尚未被充分探索但至关重要的任务。本文提出CLIP3D-AD,一种基于CLIP扩展的高效三维小样本异常检测方法。我们成功地将CLIP强大的泛化能力迁移至三维小样本异常检测任务中。具体而言,我们在给定正常图像上合成异常图像作为样本对,使CLIP适配于三维异常分类与分割任务。对于分类任务,我们引入图像适配器与文本适配器对全局视觉特征与文本特征进行微调。同时,我们提出一种由粗到精的解码器,用于融合并增强CLIP的多层中间视觉表征。为利用点云的几何信息并消除CLIP处理时存在的模态与数据差异,我们将点云投影并渲染为多视角正常与异常图像。随后设计多视图融合模块,对CLIP提取的多视图图像特征进行融合,用于增强视觉表征以进一步提升视觉-语言关联性。大量实验表明,我们的方法在MVTec-3D AD数据集上实现了具有竞争力的三维小样本异常分类与分割性能。