How to pick the best anomaly detector?

Anomaly detection has the potential to discover new physics in unexplored regions of the data. However, choosing the best anomaly detector for a given data set in a model-agnostic way is an important challenge which has hitherto largely been neglected. In this paper, we introduce the data-driven ARGOS metric, which has a sound theoretical foundation and is empirically shown to robustly select the most sensitive anomaly detection model given the data. Focusing on weakly-supervised, classifier-based anomaly detection methods, we show that the ARGOS metric outperforms other model selection metrics previously used in the literature, in particular the binary cross-entropy loss. We explore several realistic applications, including hyperparameter tuning as well as architecture and feature selection, and in all cases we demonstrate that ARGOS is robust to the noisy conditions of anomaly detection.

翻译：异常检测具有在数据未探索区域发现新物理现象的潜力。然而，以模型无关的方式为给定数据集选择最佳异常检测器是一个重要挑战，迄今为止这一问题在很大程度上被忽视。本文提出了数据驱动的ARGOS指标，该指标具有坚实的理论基础，并经验证能够根据数据稳健地选择最敏感的异常检测模型。聚焦于弱监督、基于分类器的异常检测方法，我们证明ARGOS指标优于文献中先前使用的其他模型选择指标，特别是二元交叉熵损失。我们探索了若干实际应用场景，包括超参数调优以及架构与特征选择，在所有案例中均证明ARGOS对异常检测的噪声条件具有鲁棒性。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

《利用视觉问题解答进行异常检测》美陆军实验室报告

专知会员服务

24+阅读 · 2024年5月21日

GPT-4V在异常检测表现如何？通用异常检测新曙光：华科大等揭秘GPT-4V的全方位异常检测表现

专知会员服务

39+阅读 · 2023年11月11日

弹药异常检测《使用机器学习进行缺陷表征》最佳论文，MODSIM World 2023

专知会员服务

37+阅读 · 2023年7月22日

《基于高斯混合流和入包的异常检测》2023最新57页论文

专知会员服务

29+阅读 · 2023年5月15日