Reconstruction-based approaches have achieved remarkable outcomes in anomaly detection. The exceptional image reconstruction capabilities of recently popular diffusion models have sparked research efforts to utilize them for enhanced reconstruction of anomalous images. Nonetheless, these methods might face challenges related to the preservation of image categories and pixel-wise structural integrity in the more practical multi-class setting. To solve the above problems, we propose a Difusion-based Anomaly Detection (DiAD) framework for multi-class anomaly detection, which consists of a pixel-space autoencoder, a latent-space Semantic-Guided (SG) network with a connection to the stable diffusion's denoising network, and a feature-space pre-trained feature extractor. Firstly, The SG network is proposed for reconstructing anomalous regions while preserving the original image's semantic information. Secondly, we introduce Spatial-aware Feature Fusion (SFF) block to maximize reconstruction accuracy when dealing with extensively reconstructed areas. Thirdly, the input and reconstructed images are processed by a pre-trained feature extractor to generate anomaly maps based on features extracted at different scales. Experiments on MVTec-AD and VisA datasets demonstrate the effectiveness of our approach which surpasses the state-of-the-art methods, e.g., achieving 96.8/52.6 and 97.2/99.0 (AUROC/AP) for localization and detection respectively on multi-class MVTec-AD dataset. Code will be available at https://lewandofskee.github.io/projects/diad.
翻译:摘要:基于重构的异常检测方法已取得显著成果。近年来流行的扩散模型因其卓越的图像重构能力,促使研究者探索将其用于异常图像增强重构。然而,这些方法在处理更具实用性的多类别场景时,可能面临图像类别保持与像素级结构完整性的挑战。为解决上述问题,我们提出面向多类别异常检测的扩散异常检测(DiAD)框架,该框架包含像素空间自编码器、连接稳定扩散去噪网络的潜在空间语义引导(SG)网络,以及特征空间预训练特征提取器。首先,提出SG网络用于在重构异常区域的同时保持原始图像的语义信息。其次,引入空间感知特征融合(SFF)模块,以最大化大面积重构区域的精度。最后,通过预训练特征提取器处理输入图像与重构图像,基于不同尺度提取的特征生成异常图。在MVTec-AD和VisA数据集上的实验表明,本方法性能超越现有最优技术,例如在多类别MVTec-AD数据集上定位与检测分别达到96.8/52.6与97.2/99.0(AUROC/AP)。代码将发布于https://lewandofskee.github.io/projects/diad。