Existing industrial anomaly detection methods primarily concentrate on unsupervised learning with pristine RGB images. Yet, both RGB and 3D data are crucial for anomaly detection, and the datasets are seldom completely clean in practical scenarios. To address above challenges, this paper initially delves into the RGB-3D multi-modal noisy anomaly detection, proposing a novel noise-resistant M3DM-NR framework to leveraging strong multi-modal discriminative capabilities of CLIP. M3DM-NR consists of three stages: Stage-I introduces the Suspected References Selection module to filter a few normal samples from the training dataset, using the multimodal features extracted by the Initial Feature Extraction, and a Suspected Anomaly Map Computation module to generate a suspected anomaly map to focus on abnormal regions as reference. Stage-II uses the suspected anomaly maps of the reference samples as reference, and inputs image, point cloud, and text information to achieve denoising of the training samples through intra-modal comparison and multi-scale aggregation operations. Finally, Stage-III proposes the Point Feature Alignment, Unsupervised Feature Fusion, Noise Discriminative Coreset Selection, and Decision Layer Fusion modules to learn the pattern of the training dataset, enabling anomaly detection and segmentation while filtering out noise. Extensive experiments show that M3DM-NR outperforms state-of-the-art methods in 3D-RGB multi-modal noisy anomaly detection.
翻译:现有的工业异常检测方法主要集中于使用纯净RGB图像的无监督学习。然而,RGB和3D数据对于异常检测都至关重要,且在实际场景中数据集很少完全干净。为应对上述挑战,本文首次深入探讨RGB-3D多模态含噪异常检测问题,提出一种新型抗噪框架M3DM-NR,以利用CLIP强大的多模态判别能力。M3DM-NR包含三个阶段:第一阶段引入可疑参考选择模块,通过初始特征提取器提取的多模态特征从训练数据集中筛选少量正常样本,并利用可疑异常图计算模块生成可疑异常图,以聚焦异常区域作为参考。第二阶段以参考样本的可疑异常图为参照,输入图像、点云和文本信息,通过模态内比较与多尺度聚合操作实现训练样本的去噪。最后,第三阶段提出点特征对齐、无监督特征融合、噪声判别核心集选择及决策层融合模块,学习训练数据集的模式,在滤除噪声的同时实现异常检测与分割。大量实验表明,M3DM-NR在3D-RGB多模态含噪异常检测任务上优于现有最先进方法。