Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts

This paper explores the problem of Generalist Anomaly Detection (GAD), aiming to train one single detection model that can generalize to detect anomalies in diverse datasets from different application domains without any further training on the target data. Some recent studies have shown that large pre-trained Visual-Language Models (VLMs) like CLIP have strong generalization capabilities on detecting industrial defects from various datasets, but their methods rely heavily on handcrafted text prompts about defects, making them difficult to generalize to anomalies in other applications, e.g., medical image anomalies or semantic anomalies in natural images. In this work, we propose to train a GAD model with few-shot normal images as sample prompts for AD on diverse datasets on the fly. To this end, we introduce a novel approach that learns an in-context residual learning model for GAD, termed InCTRL. It is trained on an auxiliary dataset to discriminate anomalies from normal samples based on a holistic evaluation of the residuals between query images and few-shot normal sample prompts. Regardless of the datasets, per definition of anomaly, larger residuals are expected for anomalies than normal samples, thereby enabling InCTRL to generalize across different domains without further training. Comprehensive experiments on nine AD datasets are performed to establish a GAD benchmark that encapsulate the detection of industrial defect anomalies, medical anomalies, and semantic anomalies in both one-vs-all and multi-class setting, on which InCTRL is the best performer and significantly outperforms state-of-the-art competing methods.

翻译：本文探索通用异常检测（GAD）问题，旨在训练一个能泛化检测不同应用领域多样数据集中异常的统一检测模型，且无需对目标数据进行任何额外训练。近期研究表明，诸如CLIP等大规模预训练视觉语言模型（VLM）在检测各类数据集中的工业缺陷时展现出强大泛化能力，但其方法高度依赖人工设计的缺陷文本提示，难以泛化至其他应用场景的异常检测（如医学图像异常或自然图像中的语义异常）。本研究提出通过少量正常图像作为样本提示来训练GAD模型，使其能够即时适应多样数据集的异常检测。为此，我们引入一种新颖的上下文残差学习模型InCTRL，其通过在辅助数据集上训练，基于查询图像与少量正常样本提示的残差进行整体评估来区分异常与正常样本。无论数据集类型如何，根据异常定义，异常样本相较于正常样本应产生更大残差，从而使InCTRL无需额外训练即可跨领域泛化。我们在九个异常检测数据集上进行全面实验，构建涵盖工业缺陷异常、医学异常及语义异常检测的通用基准测试（含一对一与多类别设定）。实验表明，InCTRL表现最优，显著超越当前最先进的对比方法。