Zero-shot anomaly segmentation using pre-trained foundation models is a promising approach that enables effective algorithms without expensive, domain-specific training or fine-tuning. Ensuring that these methods work across various environmental conditions and are robust to distribution shifts is an open problem. We investigate the performance of WinCLIP [14] zero-shot anomaly segmentation algorithm by perturbing test data using three semantic transformations: bounded angular rotations, bounded saturation shifts, and hue shifts. We empirically measure a lower performance bound by aggregating across per-sample worst-case perturbations and find that average performance drops by up to 20% in area under the ROC curve and 40% in area under the per-region overlap curve. We find that performance is consistently lowered on three CLIP backbones, regardless of model architecture or learning objective, demonstrating a need for careful performance evaluation.
翻译:使用预训练基础模型进行零样本异常分割是一种有前景的方法,能够在无需昂贵领域特定训练或微调的情况下实现高效算法。然而,确保这些方法在各种环境条件下均有效并具备对分布偏移的鲁棒性仍是一个未解难题。我们通过三种语义变换(有界角度旋转、有界饱和度偏移和色调偏移)扰动测试数据,对WinCLIP [14]零样本异常分割算法的性能进行了研究。通过聚合每个样本的最坏扰动情况,我们经验性地测量了性能下界,发现平均性能在ROC曲线下面积上下降高达20%,在区域重叠曲线下面积上下降高达40%。无论模型架构或学习目标如何,三种CLIP骨干网络的性能均持续降低,这表明了全面性能评估的必要性。