The core idea of visual anomaly detection is to learn the normality from normal images, but previous works have been developed specifically for certain tasks, leading to fragmentation among various tasks: defect detection, semantic anomaly detection, multi-class anomaly detection, and anomaly clustering. This one-task-one-model approach is resource-intensive and incurs high maintenance costs as the number of tasks increases. This paper presents SelFormaly, a universal and powerful anomaly detection framework. We emphasize the necessity of our off-the-shelf approach by pointing out a suboptimal issue with fluctuating performance in previous online encoder-based methods. In addition, we question the effectiveness of using ConvNets as previously employed in the literature and confirm that self-supervised ViTs are suitable for unified anomaly detection. We introduce back-patch masking and discover the new role of top k-ratio feature matching to achieve unified and powerful anomaly detection. Back-patch masking eliminates irrelevant regions that possibly hinder target-centric detection with representations of the scene layout. The top k-ratio feature matching unifies various anomaly levels and tasks. Finally, SelFormaly achieves state-of-the-art results across various datasets for all the aforementioned tasks.
翻译:视觉异常检测的核心思想是从正常图像中学习正常模式,但以往的研究针对特定任务开发,导致各类任务(缺陷检测、语义异常检测、多类异常检测和异常聚类)之间呈现碎片化。这种“一任务一模”方法资源密集,且随着任务数量增加带来高昂的维护成本。本文提出SelFormaly这一通用且强大的异常检测框架。我们通过指出现有在线编码器方法中存在的性能波动次优问题,强调了本即用型方法的必要性。此外,我们对文献中以往使用卷积神经网络的有效性提出质疑,并确认自监督视觉Transformer适用于统一异常检测。我们引入补丁后掩码机制,并发现k比值顶部特征匹配的新作用,以实现统一且强大的异常检测。补丁后掩码消除了可能借助场景布局表征干扰目标中心检测的无关区域,而k比值顶部特征匹配则统一了不同异常级别与任务。最终,SelFormaly在涵盖上述所有任务的多个数据集上取得了最先进的结果。