The unsupervised visual inspection of defects in industrial products poses a significant challenge due to substantial variations in product surfaces. Current unsupervised models struggle to strike a balance between detecting texture and object defects, lacking the capacity to discern latent representations and intricate features. In this paper, we present a novel self-supervised learning algorithm designed to derive an optimal encoder by tackling the renowned jigsaw puzzle. Our approach involves dividing the target image into nine patches, tasking the encoder with predicting the relative position relationships between any two patches to extract rich semantics. Subsequently, we introduce an affinity-augmentation method to accentuate differences between normal and abnormal latent representations. Leveraging the classic support vector data description algorithm yields final detection results. Experimental outcomes demonstrate that our proposed method achieves outstanding detection and segmentation performance on the widely used MVTec AD dataset, with rates of 95.8% and 96.8%, respectively, establishing a state-of-the-art benchmark for both texture and object defects. Comprehensive experimentation underscores the effectiveness of our approach in diverse industrial applications.
翻译:工业产品表面特征的显著差异给无监督视觉缺陷检测带来了重大挑战。现有无监督模型难以在纹理缺陷与物体缺陷检测之间取得平衡,缺乏对潜在表征与复杂特征的辨别能力。本文提出一种新型自监督学习算法,通过解决经典拼图问题来学习最优编码器。该方法将目标图像分割为九个图像块,通过训练编码器预测任意两个图像块之间的相对位置关系,从而提取丰富的语义信息。随后,我们引入亲和力增强方法以放大正常与异常潜在表征之间的差异,并结合经典支持向量数据描述算法获得最终检测结果。实验表明,该方法在广泛使用的MVTec AD数据集上实现了95.8%的检测率与96.8%的分割率,在纹理缺陷与物体缺陷检测中均达到当前最优基准。全面的实验验证了该方法在多样化工业应用中的有效性。