The area under receiver operating characteristics (AUC) is the standard measure for comparison of anomaly detectors. Its advantage is in providing a scalar number that allows a natural ordering and is independent on a threshold, which allows to postpone the choice. In this work, we question whether AUC is a good metric for anomaly detection, or if it gives a false sense of comfort, due to relying on assumptions which are unlikely to hold in practice. Our investigation shows that variations of AUC emphasizing accuracy at low false positive rate seem to be better correlated with the needs of practitioners, but also that we can compare anomaly detectors only in the case when we have representative examples of anomalous samples. This last result is disturbing, as it suggests that in many cases, we should do active or few-show learning instead of pure anomaly detection.
翻译:接收者操作特征曲线下面积是异常检测器比较的标准度量。其优势在于提供一个标量数值,允许自然排序且不依赖于阈值,从而将阈值选择问题延后处理。本研究探讨了AUC是否为异常检测的有效指标,抑或因其依赖现实中难以成立的假设而带来虚假的安全感。我们的研究表明,强调低假阳性率精度的AUC变体似乎更能契合实际应用需求,但同时也发现,仅在拥有代表性异常样本的情况下才能对异常检测器进行比较。这一最终结论令人不安,因为它暗示在许多场景中,我们应采用主动学习或少样本学习,而非纯粹的无监督异常检测。