ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection

Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across different datasets under the practical multi-class setting. The absence of standardized experimental setups can lead to potential biases in training epochs, resolution, and metric results, resulting in erroneous conclusions. This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, \textbf{\textit{ADer}}, which is a modular framework that is highly extensible for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. Additionally, we have open-sourced the GPU-assisted \href{https://pypi.org/project/ADEval}{ADEval} package to address the slow evaluation problem of metrics like time-consuming mAU-PRO on large-scale data, significantly reducing evaluation time by more than \textit{1000-fold}. Through extensive experimental results, we objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection. We hope that \textbf{\textit{ADer}} will become a valuable resource for researchers and practitioners in the field, promoting the development of more robust and generalizable anomaly detection systems. Full codes have been attached in Appendix and open-sourced at \url{https://github.com/zhangzjn/ader}.

翻译：视觉异常检测旨在通过无监督学习范式识别图像中的异常区域，在工业检测和医学病灶检测等领域的应用需求与价值日益增长。尽管近年来取得了显著进展，但在实际多类别设置下，缺乏能够充分评估各类主流方法在不同数据集上性能的综合基准。标准化实验设置的缺失可能导致训练周期、分辨率和度量结果方面的潜在偏差，进而产生错误结论。本文通过提出一个全面的视觉异常检测基准 \textbf{\textit{ADer}} 来解决这一问题，该基准是一个模块化框架，对新方法具有高度可扩展性。该基准包含来自工业和医疗领域的多个数据集，实现了十五种最先进的方法和九项综合评估指标。此外，我们开源了GPU加速的 \href{https://pypi.org/project/ADEval}{ADEval} 工具包，以解决大规模数据上诸如耗时的mAU-PRO等指标评估缓慢的问题，将评估时间显著减少超过 \textit{1000倍}。通过大量实验结果，我们客观揭示了不同方法的优缺点，并为多类别视觉异常检测的挑战和未来方向提供了见解。我们希望 \textbf{\textit{ADer}} 能成为该领域研究人员和实践者的宝贵资源，推动开发更鲁棒、可泛化的异常检测系统。完整代码已附于附录并在 \url{https://github.com/zhangzjn/ader} 开源。