Motivated by efficiency requirements, most anomaly detection and segmentation (AD&S) methods focus on processing low-resolution images, e.g., $224\times 224$ pixels, obtained by downsampling the original input images. In this setting, downsampling is typically applied also to the provided ground-truth defect masks. Yet, as numerous industrial applications demand identification of both large and tiny defects, the above-described protocol may fall short in providing a realistic picture of the actual performance attainable by current methods. Hence, in this work, we introduce a novel benchmark that evaluates methods on the original, high-resolution image and ground-truth masks, focusing on segmentation performance as a function of the size of anomalies. Our benchmark includes a metric that captures robustness with respect to defect size, i.e., the ability of a method to preserve good localization from large anomalies to tiny ones. Furthermore, we introduce an AD&S approach based on a novel Teacher-Student paradigm which relies on two shallow MLPs (the Students) that learn to transfer patch features across the layers of a frozen vision transformer (the Teacher). By means of our benchmark, we evaluate our proposal and other recent AD&S methods on high-resolution inputs containing large and tiny defects. Our proposal features the highest robustness to defect size, runs at the fastest speed, yields state-of-the-art performance on the MVTec AD dataset and state-of-the-art segmentation performance on the VisA dataset.
翻译:受效率要求驱动,大多数异常检测与分割方法专注于处理低分辨率图像(例如通过下采样原始输入图像获得的224×224像素图像)。在此设定下,下采样操作通常也应用于提供的真实缺陷掩码。然而,由于众多工业应用需要同时识别大型与微小缺陷,上述方案可能无法真实反映现有方法可达到的实际性能。为此,本研究提出一个在原始高分辨率图像和真实掩码上评估方法性能的新基准,重点关注异常尺寸变化时的分割性能。该基准包含一个能捕捉缺陷尺寸鲁棒性的评估指标,即方法在从大型异常到微小异常的检测中保持良好定位能力的一致性。此外,我们提出一种基于新型师生范式(Teacher-Student paradigm)的异常检测与分割方法,该方法通过两个浅层MLP(学生网络)学习在冻结视觉Transformer(教师网络)的各层间传递图像块特征。借助本基准,我们在包含大型与微小缺陷的高分辨率输入上评估了所提方法及其他近期异常检测与分割方法。实验表明,所提方法具有最高的缺陷尺寸鲁棒性、最快的运行速度,在MVTec AD数据集上达到最优性能,并在VisA数据集上取得最先进的分割性能。