Anomaly detection has the potential to discover new physics in unexplored regions of the data. However, choosing the best anomaly detector for a given data set in a model-agnostic way is an important challenge which has hitherto largely been neglected. In this paper, we introduce the data-driven ARGOS metric, which has a sound theoretical foundation and is empirically shown to robustly select the most sensitive anomaly detection model given the data. Focusing on weakly-supervised, classifier-based anomaly detection methods, we show that the ARGOS metric outperforms other model selection metrics previously used in the literature, in particular the binary cross-entropy loss. We explore several realistic applications, including hyperparameter tuning as well as architecture and feature selection, and in all cases we demonstrate that ARGOS is robust to the noisy conditions of anomaly detection.
翻译:异常检测具有在数据未探索区域发现新物理现象的潜力。然而,以模型无关的方式为给定数据集选择最佳异常检测器是一个重要挑战,迄今为止这一问题在很大程度上被忽视。本文提出了数据驱动的ARGOS指标,该指标具有坚实的理论基础,并经验证能够根据数据稳健地选择最敏感的异常检测模型。聚焦于弱监督、基于分类器的异常检测方法,我们证明ARGOS指标优于文献中先前使用的其他模型选择指标,特别是二元交叉熵损失。我们探索了若干实际应用场景,包括超参数调优以及架构与特征选择,在所有案例中均证明ARGOS对异常检测的噪声条件具有鲁棒性。