Industrial anomaly detection is generally addressed as an unsupervised task that aims at locating defects with only normal training samples. Recently, numerous 2D anomaly detection methods have been proposed and have achieved promising results, however, using only the 2D RGB data as input is not sufficient to identify imperceptible geometric surface anomalies. Hence, in this work, we focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets, i.e., ImageNet, to construct feature databases. And we empirically find that directly using these pre-trained models is not optimal, it can either fail to detect subtle defects or mistake abnormal features as normal ones. This may be attributed to the domain gap between target industrial data and source data.Towards this problem, we propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.Both intra-modal adaptation and cross-modal alignment are optimized from a local-to-global perspective in LSFA to ensure the representation quality and consistency in the inference stage.Extensive experiments demonstrate that our method not only brings a significant performance boost to feature embedding based approaches, but also outperforms previous State-of-The-Art (SoTA) methods prominently on both MVTec-3D AD and Eyecandies datasets, e.g., LSFA achieves 97.1% I-AUROC on MVTec-3D, surpass previous SoTA by +3.4%.
翻译:工业异常检测通常作为一项无监督任务来处理,旨在仅使用正常训练样本定位缺陷。近年来,虽涌现出大量二维异常检测方法并取得了显著成果,但仅以二维RGB数据作为输入难以识别不可见的几何表面异常。因此,本工作聚焦于多模态异常检测。具体而言,我们研究了早期多模态方法——尝试利用在大规模视觉数据集(如ImageNet)上预训练的模型构建特征数据库。实验发现,直接使用这些预训练模型并非最优,它们或无法检测细微缺陷,或将异常特征误判为正常特征,这归因于目标工业数据与源数据之间的领域差异。针对此问题,我们提出了一种局部到全局的自监督特征适配(LSFA)方法,通过微调配适器学习面向任务的特征表示以实现异常检测。LSFA从局部到全局视角优化模态内适配与跨模态对齐,确保推理阶段表示质量与一致性。大量实验表明,本方法不仅显著提升了基于特征嵌入方法的性能,更在MVTec-3D AD与Eyecandies数据集上大幅超越先前最先进(SoTA)方法,例如LSFA在MVTec-3D上达到97.1%的I-AUROC,较先前SoTA提升3.4%。