The performance of visual anomaly inspection in industrial quality control is often constrained by the scarcity of real anomalous samples. Consequently, anomaly synthesis techniques have been developed to enlarge training sets and enhance downstream inspection. However, existing methods either suffer from poor integration caused by inpainting or fail to provide accurate masks. To address these limitations, we propose GroundingAnomaly, a novel few-shot anomaly image generation framework. Our framework introduces a Spatial Conditioning Module that leverages per-pixel semantic maps to enable precise spatial control over the synthesized anomalies. Furthermore, a Gated Self-Attention Module is designed to inject conditioning tokens into a frozen U-Net via gated attention layers. This carefully preserves pretrained priors while ensuring stable few-shot adaptation. Extensive evaluations on the MVTec AD and VisA datasets demonstrate that GroundingAnomaly generates high-quality anomalies and achieves state-of-the-art performance across multiple downstream tasks, including anomaly detection, segmentation, and instance-level detection.
翻译:工业质量控制中的视觉异常检测性能常受限于真实异常样本的匮乏。为此,异常合成技术被开发用于扩充训练集并提升下游检测能力。然而,现有方法要么因图像修复导致融合效果不佳,要么无法提供精准掩膜。针对这些局限,我们提出GroundingAnomaly——一种新型少样本异常图像生成框架。该框架引入空间条件模块,利用逐像素语义图实现对合成异常的精确空间控制;同时设计门控自注意力模块,通过门控注意力层将条件标记注入冻结U-Net,从而在确保稳定少样本适配的同时,精心保留预训练先验。在MVTec AD和VisA数据集上的大量评估表明,GroundingAnomaly能够生成高质量异常,并在多项下游任务(包括异常检测、分割及实例级检测)中达到最优性能。