The CLIP model's outstanding generalization has driven recent success in Zero-Shot Anomaly Detection (ZSAD) for detecting anomalies in unseen categories. The core challenge in ZSAD is to specialize the model for anomaly detection tasks while preserving CLIP's powerful generalization capability. Existing approaches attempting to solve this challenge share the fundamental limitation of a patch-agnostic design that processes all patches monolithically without regard for their unique characteristics. To address this limitation, we propose \textbf{MoECLIP}, a Mixture-of-Experts (MoE) architecture for the ZSAD task, which achieves patch-level adaptation by dynamically routing each image patch to a specialized Low-Rank Adaptation (LoRA) expert based on its unique characteristics. Furthermore, to prevent functional redundancy among the LoRA experts, we introduce (1) Frozen Orthogonal Feature Separation (FOFS), which orthogonally separates the input feature space to force experts to focus on distinct information, and (2) a simplex equiangular tight frame (ETF) loss to regulate the expert outputs to form maximally equiangular representations. Comprehensive experimental results across 14 benchmark datasets spanning industrial and medical domains demonstrate that MoECLIP outperforms existing state-of-the-art methods. The code is available at https://github.com/CoCoRessa/MoECLIP.
翻译:CLIP模型卓越的泛化能力推动了零样本异常检测(ZSAD)在未见类别异常识别中的最新进展。ZSAD的核心挑战在于使模型适应异常检测任务,同时保持CLIP强大的泛化能力。现有解决此挑战的方法均存在补丁无关设计的根本局限,即整体处理所有图像补丁而忽略其独特特征。为突破此限制,本文提出\textbf{MoECLIP}——一种面向ZSAD任务的混合专家(MoE)架构,通过基于每个图像补丁的独特特征将其动态路由至专业化的低秩自适应(LoRA)专家,实现补丁级别的自适应。此外,为避免LoRA专家间的功能冗余,我们引入:(1)冻结正交特征分离(FOFS)机制,通过正交分离输入特征空间迫使专家关注差异化信息;(2)单纯形等角紧框架(ETF)损失函数,约束专家输出形成最大等角度的表示形式。在涵盖工业与医疗领域的14个基准数据集上的综合实验结果表明,MoECLIP显著优于现有最先进方法。代码已开源:https://github.com/CoCoRessa/MoECLIP。