Anomaly Detection with Score Distribution Discrimination

from arxiv, Accepted by KDD 2023. Detailed discussions can be found in https://openreview.net/forum?id=P1Worw-M1Tf&referrer=[the%20profile%20of%20Minqi%20Jiang](/profile?id=~Minqi_Jiang2)

Recent studies give more attention to the anomaly detection (AD) methods that can leverage a handful of labeled anomalies along with abundant unlabeled data. These existing anomaly-informed AD methods rely on manually predefined score target(s), e.g., prior constant or margin hyperparameter(s), to realize discrimination in anomaly scores between normal and abnormal data. However, such methods would be vulnerable to the existence of anomaly contamination in the unlabeled data, and also lack adaptation to different data scenarios. In this paper, we propose to optimize the anomaly scoring function from the view of score distribution, thus better retaining the diversity and more fine-grained information of input data, especially when the unlabeled data contains anomaly noises in more practical AD scenarios. We design a novel loss function called Overlap loss that minimizes the overlap area between the score distributions of normal and abnormal samples, which no longer depends on prior anomaly score targets and thus acquires adaptability to various datasets. Overlap loss consists of Score Distribution Estimator and Overlap Area Calculation, which are introduced to overcome challenges when estimating arbitrary score distributions, and to ensure the boundness of training loss. As a general loss component, Overlap loss can be effectively integrated into multiple network architectures for constructing AD models. Extensive experimental results indicate that Overlap loss based AD models significantly outperform their state-of-the-art counterparts, and achieve better performance on different types of anomalies.

翻译：近期研究更加关注能够利用少量标注异常数据与大量未标注数据的异常检测方法。现有这些利用异常信息的检测方法依赖于人工预设的分数目标（如先验常数或间隔超参数），以实现正常与异常数据在异常分数上的区分。然而，此类方法易受未标注数据中异常污染的影响，且缺乏对不同数据场景的适应性。本文提出从分数分布视角优化异常评分函数，从而更好地保留输入数据的多样性与细粒度信息，尤其适用于未标注数据包含异常噪声的更实际检测场景。我们设计了一种新型损失函数——重叠损失（Overlap loss），该函数通过最小化正常与异常样本分数分布的重叠区域，不再依赖先验异常分数目标，因而能够适应不同数据集。重叠损失由分数分布估计器与重叠面积计算两部分构成，前者用于克服任意分数分布估计中的挑战，后者确保训练损失的有界性。作为通用损失组件，重叠损失可有效集成到多种网络架构中构建异常检测模型。大量实验结果表明，基于重叠损失的异常检测模型显著优于当前最先进方法，并在不同类型异常上取得更优性能。