Anomaly detection (AD) aims to identify defective images and localize their defects (if any). Ideally, AD models should be able to detect defects over many image classes; without relying on hard-coded class names that can be uninformative or inconsistent across datasets; learn without anomaly supervision; and be robust to the long-tailed distributions of real-world applications. To address these challenges, we formulate the problem of long-tailed AD by introducing several datasets with different levels of class imbalance and metrics for performance evaluation. We then propose a novel method, LTAD, to detect defects from multiple and long-tailed classes, without relying on dataset class names. LTAD combines AD by reconstruction and semantic AD modules. AD by reconstruction is implemented with a transformer-based reconstruction module. Semantic AD is implemented with a binary classifier, which relies on learned pseudo class names and a pretrained foundation model. These modules are learned over two phases. Phase 1 learns the pseudo-class names and a variational autoencoder (VAE) for feature synthesis that augments the training data to combat long-tails. Phase 2 then learns the parameters of the reconstruction and classification modules of LTAD. Extensive experiments using the proposed long-tailed datasets show that LTAD substantially outperforms the state-of-the-art methods for most forms of dataset imbalance. The long-tailed dataset split is available at https://zenodo.org/records/10854201 .
翻译:异常检测(AD)旨在识别缺陷图像并定位其缺陷(若存在)。理想的异常检测模型应具备以下能力:在无需依赖硬编码类别名称(此类名称可能缺乏信息量或跨数据集不一致)的情况下,可检测多类图像中的缺陷;无需异常监督即可学习;并对实际应用中的长尾分布具有鲁棒性。为解决上述难题,我们通过引入多个具有不同类别不平衡程度的数据集及性能评估指标,系统定义了长尾异常检测问题。进而提出新型方法LTAD,在不依赖数据集类别名称的前提下,实现对多类长尾分布数据的缺陷检测。LTAD融合了基于重构的异常检测与语义异常检测模块:前者通过基于Transformer的重构模块实现,后者则借助学习得到的伪类别名称及预训练基础模型构建二元分类器。这两个模块分两阶段学习:第一阶段学习伪类别名称并训练变分自编码器(VAE)进行特征合成,通过扩充训练数据应对长尾问题;第二阶段学习LTAD重构模块与分类模块的参数。在本文提出的长尾数据集上进行的广泛实验表明,LTAD在绝大多数数据集不平衡场景下显著优于现有最优方法。长尾数据集划分可在https://zenodo.org/records/10854201获取。