Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers

Recent advancements in industrial Anomaly Detection (AD) have shown that incorporating a few anomalous samples during training can significantly boost accuracy. However, this performance improvement comes at a high cost: extensive annotation efforts, which are often impractical in real-world applications. In this work, we propose a novel framework called "Weakly-supervised RESidual Transformer" (WeakREST), which aims to achieve high AD accuracy while minimizing the need for extensive annotations. First, we reformulate the pixel-wise anomaly localization task into a block-wise classification problem. By shifting the focus to block-wise level, we can drastically reduce the amount of required annotations without compromising on the accuracy of anomaly detection Secondly, we design a residual-based transformer model, termed "Positional Fast Anomaly Residuals" (PosFAR), to classify the image blocks in real time. We further propose to label the anomalous regions using only bounding boxes or image tags as weaker labels, leading to a semi-supervised learning setting. On the benchmark dataset MVTec-AD, our proposed WeakREST framework achieves a remarkable Average Precision (AP) of 83.0%, significantly outperforming the previous best result of 75.8% in the unsupervised setting. In the supervised AD setting, WeakREST further improves performance, attaining an AP of 87.6% compared to the previous best of 78.6%. Notably, even when utilizing weaker labels based on bounding boxes, WeakREST surpasses recent leading methods that rely on pixel-wise supervision, achieving an AP of 87.1% against the prior best of 78.6% on MVTec-AD. This precision advantage is also consistently observed on other well-known AD datasets, such as BTAD and KSDD2.

翻译：近期工业异常检测（AD）的研究进展表明，在训练过程中引入少量异常样本可显著提升检测精度。然而，这种性能改进伴随着高昂代价：需要大量标注工作，这在实际应用中往往难以实现。本文提出一种名为“弱监督残差Transformer”（WeakREST）的新型框架，旨在实现高精度异常检测的同时最大限度减少对大量标注的依赖。首先，我们将像素级异常定位任务重新定义为块级分类问题。通过将关注点转移到块级层面，能够在保证异常检测精度的前提下大幅减少所需标注量。其次，我们设计了一种基于残差的Transformer模型，称为“位置快速异常残差”（PosFAR），用于实时分类图像块。我们进一步提出仅使用边界框或图像标签作为弱标签来标注异常区域，从而构建半监督学习场景。在基准数据集MVTec-AD上，我们提出的WeakREST框架取得了83.0%的平均精度（AP），显著优于无监督设置下先前最佳结果75.8%。在监督式异常检测设置中，WeakREST进一步将性能提升至87.6% AP，而先前最佳结果为78.6%。值得注意的是，即使使用基于边界框的弱标签，WeakREST在MVTec-AD数据集上仍以87.1% AP超越近期依赖像素级监督的主流方法（先前最佳结果为78.6%）。这种精度优势在其他知名异常检测数据集（如BTAD和KSDD2）上也得到了一致验证。