Industrial Anomaly Detection (IAD) is vital for manufacturing, yet traditional methods face significant challenges: unsupervised approaches yield rough localizations requiring manual thresholds, while supervised methods overfit due to scarce, imbalanced data. Both suffer from the "One Anomaly Class, One Model" limitation. To address this, we propose Referring Industrial Anomaly Segmentation (RIAS), a paradigm leveraging language to guide detection. RIAS generates precise masks from text descriptions without manual thresholds and uses universal prompts to detect diverse anomalies with a single model. We introduce the MVTec-Ref dataset to support this, designed with diverse referring expressions and focusing on anomaly patterns, notably with 95% small anomalies. We also propose the Dual Query Token with Mask Group Transformer (DQFormer) benchmark, enhanced by Language-Gated Multi-Level Aggregation (LMA) to improve multi-scale segmentation. Unlike traditional methods using redundant queries, DQFormer employs only "Anomaly" and "Background" tokens for efficient visual-textual integration. Experiments demonstrate RIAS's effectiveness in advancing IAD toward open-set capabilities. Code: https://github.com/swagger-coder/RIAS-MVTec-Ref.
翻译:工业异常检测(IAD)对制造业至关重要,然而传统方法面临重大挑战:无监督方法产生粗略的定位结果,需要手动设置阈值,而监督方法则因稀缺且不平衡的数据而容易过拟合。两者都受到"一类异常,一个模型"的限制。为解决此问题,我们提出了参考式工业异常分割(RIAS),这是一种利用语言指导检测的范式。RIAS能够从文本描述生成精确的掩码,无需手动阈值,并使用通用提示词通过单一模型检测多种异常。我们为此引入了MVTec-Ref数据集,其设计包含多样化的参考表达,并专注于异常模式,特别是包含95%的小型异常。我们还提出了基于双查询令牌与掩码组Transformer(DQFormer)的基准模型,通过语言门控多级聚合(LMA)增强以改进多尺度分割。与传统方法使用冗余查询不同,DQFormer仅采用"异常"和"背景"两个令牌,以实现高效的视觉-文本集成。实验证明RIAS在推动IAD迈向开放集能力方面的有效性。代码:https://github.com/swagger-coder/RIAS-MVTec-Ref。