Anomaly detection methods have demonstrated remarkable success across various applications. However, assessing their performance, particularly at the pixel-level, presents a complex challenge due to the severe imbalance that is most commonly present between normal and abnormal samples. Commonly adopted evaluation metrics designed for pixel-level detection may not effectively capture the nuanced performance variations arising from this class imbalance. In this paper, we dissect the intricacies of this challenge, underscored by visual evidence and statistical analysis, leading to delve into the need for evaluation metrics that account for the imbalance. We offer insights into more accurate metrics, using eleven leading contemporary anomaly detection methods on twenty-one anomaly detection problems. Overall, from this extensive experimental evaluation, we can conclude that Precision-Recall-based metrics can better capture relative method performance, making them more suitable for the task.
翻译:异常检测方法已在各种应用中展现出显著的成功。然而,评估其性能,尤其是在像素级别,由于正常样本与异常样本之间普遍存在的严重不平衡,构成了一项复杂的挑战。为像素级检测设计的常用评估指标可能无法有效捕捉这种类别不平衡所带来的细微性能差异。本文通过视觉证据和统计分析深入剖析了这一挑战的复杂性,进而探讨了对能够考虑不平衡性的评估指标的需求。基于21个异常检测问题上的11种领先当代异常检测方法,我们提供了关于更准确指标的见解。总体而言,从这项广泛的实验评估中,我们可以得出结论:基于精确率-召回率的指标能更好地捕捉方法的相对性能,使其更适合此项任务。