Logical anomalies (LA) refer to data violating underlying logical constraints e.g., the quantity, arrangement, or composition of components within an image. Detecting accurately such anomalies requires models to reason about various component types through segmentation. However, curation of pixel-level annotations for semantic segmentation is both time-consuming and expensive. Although there are some prior few-shot or unsupervised co-part segmentation algorithms, they often fail on images with industrial object. These images have components with similar textures and shapes, and a precise differentiation proves challenging. In this study, we introduce a novel component segmentation model for LA detection that leverages a few labeled samples and unlabeled images sharing logical constraints. To ensure consistent segmentation across unlabeled images, we employ a histogram matching loss in conjunction with an entropy loss. As segmentation predictions play a crucial role, we propose to enhance both local and global sample validity detection by capturing key aspects from visual semantics via three memory banks: class histograms, component composition embeddings and patch-level representations. For effective LA detection, we propose an adaptive scaling strategy to standardize anomaly scores from different memory banks in inference. Extensive experiments on the public benchmark MVTec LOCO AD reveal our method achieves 98.1% AUROC in LA detection vs. 89.6% from competing methods.
翻译:逻辑异常(LA)指违反数据中潜在逻辑约束(如图像中组件的数量、排列或组成)的数据。准确检测此类异常需要模型通过分割来推理各种组件类型。然而,为语义分割标注像素级标注既耗时又昂贵。尽管存在一些先前的少样本或无监督共部件分割算法,但这些算法在处理工业对象图像时往往失败。这些图像中的组件具有相似的纹理和形状,精确区分极具挑战性。在本研究中,我们提出了一种用于LA检测的新型组件分割模型,该模型利用少量标注样本和共享逻辑约束的无标签图像。为确保无标签图像的分割一致性,我们结合直方图匹配损失和熵损失。由于分割预测至关重要,我们提出通过三个记忆库(类别直方图、组件组合嵌入和补丁级表示)捕获视觉语义的关键方面,从而增强局部和全局样本有效性检测。为有效进行LA检测,我们提出一种自适应缩放策略,在推理时标准化来自不同记忆库的异常分数。在公共基准MVTec LOCO AD上的大量实验表明,我们的方法在LA检测中实现了98.1%的AUROC,而竞争方法为89.6%。