Anomaly detection and localization without any manual annotations and prior knowledge is a challenging task under the setting of unsupervised learning. The existing works achieve excellent performance in the anomaly detection, but with complex networks or cumbersome pipelines. To address this issue, this paper explores a simple but effective architecture in the anomaly detection. It consists of a well pre-trained encoder to extract hierarchical feature representations and a decoder to reconstruct these intermediate features from the encoder. In particular, it does not require any data augmentations and anomalous images for training. The anomalies can be detected when the decoder fails to reconstruct features well, and then errors of hierarchical feature reconstruction are aggregated into an anomaly map to achieve anomaly localization. The difference comparison between those features of encoder and decode lead to more accurate and robust localization results than the comparison in single feature or pixel-by-pixel comparison in the conventional works. Experiment results show that the proposed method outperforms the state-of-the-art methods on MNIST, Fashion-MNIST, CIFAR-10, and MVTec Anomaly Detection datasets on both anomaly detection and localization.
翻译:在无监督学习设置下,不依赖任何人工标注和先验知识进行异常检测与定位是一项具有挑战性的任务。现有工作在异常检测中取得了优异性能,但往往采用复杂网络或繁琐流程。为应对此问题,本文探索了一种简单而有效的异常检测架构。该架构包含一个预训练良好的编码器,用于提取层级特征表征;以及一个解码器,用于重构来自编码器的这些中间特征。特别地,该方法无需任何数据增强或异常图像进行训练。当解码器无法良好重构特征时,可检测出异常,随后通过聚合层级特征重构的误差生成异常图,从而实现异常定位。相较于传统方法中基于单一特征或逐像素比较的方式,对编码器与解码器特征进行差异比较能够获得更精准、鲁棒的定位结果。实验结果表明,所提方法在MNIST、Fashion-MNIST、CIFAR-10及MVTec异常检测数据集上的异常检测与定位任务中均优于现有最先进方法。