In industrial anomaly detection, model efficiency and mobile-friendliness become the primary concerns in real-world applications. Simultaneously, the impressive generalization capabilities of Segment Anything (SAM) have garnered broad academic attention, making it an ideal choice for localizing unseen anomalies and diverse real-world patterns. In this paper, considering these two critical factors, we propose a SAM-guided Two-stream Lightweight Model for unsupervised anomaly detection (STLM) that not only aligns with the two practical application requirements but also harnesses the robust generalization capabilities of SAM. We employ two lightweight image encoders, i.e., our two-stream lightweight module, guided by SAM's knowledge. To be specific, one stream is trained to generate discriminative and general feature representations in both normal and anomalous regions, while the other stream reconstructs the same images without anomalies, which effectively enhances the differentiation of two-stream representations when facing anomalous regions. Furthermore, we employ a shared mask decoder and a feature aggregation module to generate anomaly maps. Our experiments conducted on MVTec AD benchmark show that STLM, with about 16M parameters and achieving an inference time in 20ms, competes effectively with state-of-the-art methods in terms of performance, 98.26% on pixel-level AUC and 94.92% on PRO. We further experiment on more difficult datasets, e.g., VisA and DAGM, to demonstrate the effectiveness and generalizability of STLM.
翻译:在工业异常检测中,模型效率与移动端友好性是实际应用中的核心关注点。与此同时,Segment Anything(SAM)模型卓越的泛化能力引起了学界的广泛关注,使其成为定位未见异常与多样化现实模式的理想选择。本文综合考虑上述两个关键因素,提出了一种基于SAM引导的双流轻量级无监督异常检测模型(STLM),该模型不仅满足实际应用需求,同时有效利用SAM强大的泛化能力。我们采用两个轻量级图像编码器(即双流轻量模块)并引入SAM的知识进行引导。具体而言,一个流被训练为在正常与异常区域生成具有判别性与通用性的特征表示,而另一个流则重构不含异常的同幅图像,从而有效增强双流表示面对异常区域时的区分能力。此外,我们采用共享掩码解码器与特征聚合模块生成异常图。在MVTec AD基准上的实验表明,STLM仅需约1600万参数、推理时间20毫秒,即可与最先进方法在性能上展开竞争——像素级AUC达98.26%,PRO指标达94.92%。我们进一步在更具挑战性的数据集(如VisA与DAGM)上开展实验,验证了STLM的有效性与泛化能力。