DifferAD-R1: A Difference-Guided IndustrialAnomaly Localization with Multimodal LargeLanguage Models

Industrial anomaly localization aims to accurately identify and localize abnormal regions in industrial products, addressing the critical challenge of detecting unseen defect categories in real-world scenarios. Traditional closed-set methods often suffer from poor cross-scenario generalization, while existingMultimodal Large Language Model (MLLM)-based approachesface two core limitations: they either adopt QA-style paradigmsmisaligned with the practical demands of localization, or relyon standard optimization techniques such as Group RelativePolicy Optimization (GRPO), which fails to deliver effectivelearning signals for subtle defects. To tackle these issues, thispaper proposes DifferAD-R1, an MLLM-augmented reinforcement learning framework tailored for industrial anomaly localization. We design a Difference-Guided dual-image paradigm,which reformulates the localization task as a one-shot difference grounding problem to effectively explore cross-scenarioanomalies. A Dual-Consistency Localization Reward is developedfor hard-to-detect anomalies, enhancing optimization stabilityand robustness. Additionally, we integrate a difficulty-awarestrategy with adaptive reweighting and group-wise resamplingto prioritize learning on challenging instances. To facilitateevaluations in real-world industrial settings, we construct theAD-DualDiff dataset, comprising 13K paired images across 20categories. Experimental results demonstrate that DifferADR1 significantly outperforms existing baselines and achievescompetitive performance compared to large-scale models likeQwen3-VL (235B parameters). Our code is publicly availableat: https://github.com/Rong2026/work-1.

翻译：工业异常定位旨在准确识别和定位工业产品中的异常区域，以应对真实场景中检测未见缺陷类别的关键挑战。传统封闭集方法通常难以实现跨场景泛化，而现有基于多模态大语言模型（MLLM）的方法存在两个核心局限性：要么采用与定位实际需求不符的问答式范式，要么依赖标准优化技术（如群体相对策略优化，GRPO），难以对细微缺陷提供有效的学习信号。针对这些问题，本文提出了DifferAD-R1——一种专为工业异常定位设计的MLLM增强强化学习框架。我们设计了差异引导双图像范式，将定位任务重构为一次性差异定位问题，从而有效探索跨场景异常。针对难以检测的异常，开发了双重一致性定位奖励，以增强优化稳定性和鲁棒性。此外，我们整合了难度感知策略，结合自适应重加权和分组重采样，优先学习困难实例。为促进真实工业场景下的评估，我们构建了AD-DualDiff数据集，包含20个类别的1.3万对图像。实验结果表明，DifferAD-R1显著优于现有基线，并取得了与大规模模型（如Qwen3-VL，235B参数）相当的竞争性能。我们的代码已公开在：https://github.com/Rong2026/work-1。