Anomaly detection is an important task for complex systems (e.g., industrial facilities, manufacturing, large-scale science experiments), where failures in a sub-system can lead to low yield, faulty products, or even damage to components. While complex systems often have a wealth of data, labeled anomalies are typically rare (or even nonexistent) and expensive to acquire. Unsupervised approaches are therefore common and typically search for anomalies either by distance or density of examples in the input feature space (or some associated low-dimensional representation). This paper presents a novel approach called CoAD, which is specifically designed for multi-modal tasks and identifies anomalies based on \textit{coincident} behavior across two different slices of the feature space. We define an \textit{unsupervised} metric, $\hat{F}_\beta$, out of analogy to the supervised classification $F_\beta$ statistic. CoAD uses $\hat{F}_\beta$ to train an anomaly detection algorithm on \textit{unlabeled data}, based on the expectation that anomalous behavior in one feature slice is coincident with anomalous behavior in the other. The method is illustrated using a synthetic outlier data set and a MNIST-based image data set, and is compared to prior state-of-the-art on two real-world tasks: a metal milling data set and a data set from a particle accelerator.
翻译:异常检测是复杂系统(如工业设施、制造业、大规模科学实验)中的重要任务,其中子系统的故障可能导致低产量、产品缺陷甚至部件损坏。尽管复杂系统通常拥有丰富的数据,但标注异常往往极少(甚至不存在)且获取成本高昂。因此,无监督方法较为常见,通常通过输入特征空间(或相关低维表示)中样本的距离或密度来搜索异常。本文提出了一种名为CoAD的新方法,专门针对多模态任务设计,通过两个不同特征切片中的“共现”行为识别异常。我们类比有监督分类的$F_\beta$统计量,定义了一种无监督度量指标$\hat{F}_\beta$。基于一个特征切片中的异常行为与另一特征切片中的异常行为共现的预期,CoAD利用$\hat{F}_\beta$在未标注数据上训练异常检测算法。该方法通过合成离群数据集和基于MNIST的图像数据集进行演示,并在两个实际任务(金属铣削数据集和粒子加速器数据集)上与先前最先进方法进行了比较。