The paper explores the industrial multimodal Anomaly Detection (AD) task, which exploits point clouds and RGB images to localize anomalies. We introduce a novel light and fast framework that learns to map features from one modality to the other on nominal samples. At test time, anomalies are detected by pinpointing inconsistencies between observed and mapped features. Extensive experiments show that our approach achieves state-of-the-art detection and segmentation performance in both the standard and few-shot settings on the MVTec 3D-AD dataset while achieving faster inference and occupying less memory than previous multimodal AD methods. Moreover, we propose a layer-pruning technique to improve memory and time efficiency with a marginal sacrifice in performance.
翻译:本文探讨了工业多模态异常检测任务,该任务利用点云和RGB图像来定位异常。我们提出了一种新颖的轻量级快速框架,该框架学习在正常样本上将一种模态的特征映射到另一种模态。在测试阶段,通过精确定位观测特征与映射特征之间的不一致性来检测异常。大量实验表明,在MVTec 3D-AD数据集的标准设置和少样本设置下,我们的方法在检测和分割性能上均达到了最先进水平,同时与先前的多模态异常检测方法相比,实现了更快的推理速度和更少的内存占用。此外,我们提出了一种层剪枝技术,以在性能略有牺牲的情况下,显著提高内存和时间效率。