Industrial anomaly detection is important for smart manufacturing, but many deep learning approaches produce only binary decisions and provide limited semantic explanations. Multimodal large language models (MLLMs) can potentially generate fine-grained, language-based analyses, yet existing methods often require costly fine-tuning and do not consistently improve anomaly detection accuracy compared to lightweight specialist detectors. We propose expert-augmented attention guidance for industrial anomaly detection in MLLMs (EAGLE), a tuning-free framework that integrates outputs from expert model to guide MLLMs toward both accurate detection and interpretable anomaly descriptions. We further study how EAGLE affects MLLMs internals by examining the attention distribution of MLLMs to the anomalous image regions in the intermediate layers. We observe that successful anomaly detection is associated with increased attention concentration on anomalous regions, and EAGLE tends to encourage this alignment. Experiments on MVTec-AD and VisA show that EAGLE improves anomaly detection performance across multiple MLLMs without any parameter updates, achieving results comparable to fine-tuning based methods. Code is available at \href{https://github.com/shengtun/Eagle}{https://github.com/shengtun/Eagle}
翻译:工业异常检测对于智能制造至关重要,但许多深度学习方法仅产生二元决策且提供的语义解释有限。多模态大语言模型(MLLMs)虽能生成细粒度的、基于语言的分析,然而现有方法通常需要昂贵的微调,且相较于轻量级专用检测器,其异常检测精度并未获得持续提升。我们提出了一种面向MLLMs工业异常检测的专家增强注意力引导框架(EAGLE),该免调优框架通过整合专家模型的输出来引导MLLMs实现精确检测与可解释的异常描述。我们进一步通过考察MLLMs中间层对异常图像区域的注意力分布,研究了EAGLE如何影响MLLMs的内部机制。我们观察到,成功的异常检测与异常区域注意力集中度的提升相关,而EAGLE倾向于促进这种对齐。在MVTec-AD和VisA数据集上的实验表明,EAGLE无需任何参数更新即可提升多种MLLMs的异常检测性能,取得了与基于微调方法相当的结果。代码发布于 \href{https://github.com/shengtun/Eagle}{https://github.com/shengtun/Eagle}。