Industrial anomaly detection is increasingly relying on foundation models, aiming for strong out-of-distribution generalization and rapid adaptation in real-world deployments. Notably, past studies have primarily focused on textual prompt tuning, leaving the intrinsic visual counterpart fragmented into processing steps specific to each foundation model. We aim to address this limitation by proposing a unified patch-focused framework, Patch-Exclusive Anomaly Detection (PatchEAD), enabling training-free anomaly detection that is compatible with diverse foundation models. The framework constructs visual prompting techniques, including an alignment module and foreground masking. Our experiments show superior few-shot and batch zero-shot performance compared to prior work, despite the absence of textual features. Our study further examines how backbone structure and pretrained characteristics affect patch-similarity robustness, providing actionable guidance for selecting and configuring foundation models for real-world visual inspection. These results confirm that a well-unified patch-only framework can enable quick, calibration-light deployment without the need for carefully engineered textual prompts.
翻译:工业异常检测日益依赖基础模型,旨在实现强大的分布外泛化能力及实际部署中的快速适应。值得注意的是,以往研究主要集中于文本提示调优,而将内在的视觉对应部分分散至各基础模型特有的处理步骤中。为克服这一局限,本文提出一种统一的块级聚焦框架——块级专属异常检测(PatchEAD),实现无需训练且兼容多种基础模型的异常检测。该框架构建了包含对齐模块与前景掩码的视觉提示技术。实验表明,即使未使用文本特征,本方法在少样本与批量零样本性能上均优于先前工作。研究进一步探讨了骨干网络结构与预训练特性如何影响块相似性鲁棒性,为实际视觉检测任务中基础模型的选择与配置提供了可操作的指导。这些结果证实,一个精心统一的纯块级框架能够实现快速、轻校准的部署,无需精心设计的文本提示。