ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection

Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across different datasets under the practical multi-class setting. The absence of standardized experimental setups can lead to potential biases in training epochs, resolution, and metric results, resulting in erroneous conclusions. This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, \textbf{\textit{ADer}}, which is a modular framework that is highly extensible for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. Additionally, we have open-sourced the GPU-assisted \href{https://pypi.org/project/ADEval}{ADEval} package to address the slow evaluation problem of metrics like time-consuming mAU-PRO on large-scale data, significantly reducing evaluation time by more than \textit{1000-fold}. Through extensive experimental results, we objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection. We hope that \textbf{\textit{ADer}} will become a valuable resource for researchers and practitioners in the field, promoting the development of more robust and generalizable anomaly detection systems. Full codes have been attached in Appendix and open-sourced at \url{https://github.com/zhangzjn/ader}.

翻译：视觉异常检测旨在通过无监督学习范式识别图像中的异常区域，在工业检测和医学病变检测等领域具有日益增长的应用需求与价值。尽管近年来取得了显著进展，但缺乏能够充分评估各主流方法在实际多类场景下跨不同数据集性能的综合基准。标准化实验设置的缺失可能导致训练轮次、分辨率和度量结果存在潜在偏差，从而得出错误结论。本文通过提出一个全面的视觉异常检测基准 \textbf{\textit{ADer}} 来解决这一问题，该基准采用高度可扩展的模块化框架设计，便于集成新方法。该基准涵盖了工业与医学领域的多个数据集，实现了十五种最先进方法和九种综合性评估指标。此外，我们开源了GPU加速的 \href{https://pypi.org/project/ADEval}{ADEval} 包，以解决大规模数据上mAU-PRO等耗时指标评估缓慢的问题，将评估时间显著降低超过 \textit{1000倍}。通过大量实验结果，我们客观揭示了不同方法的优劣，并深入剖析了多类视觉异常检测面临的挑战与未来发展方向。我们期望 \textbf{\textit{ADer}} 能成为该领域研究者和实践者的宝贵资源，推动更鲁棒、更通用的异常检测系统的发展。完整代码已附于附录，并开源至 \url{https://github.com/zhangzjn/ader}。