Semi-supervised anomaly detection, which aims to improve the performance of the anomaly detector by using a small amount of anomaly data in addition to unlabeled data, has attracted attention. Existing semi-supervised approaches assume that unlabeled data are mostly normal. They train the anomaly detector to minimize the anomaly scores for the unlabeled data, and to maximize those for the anomaly data. However, in practice, the unlabeled data are often contaminated with anomalies. This weakens the effect of maximizing the anomaly scores for anomalies, and prevents us from improving the detection performance. To solve this problem, we propose the positive-unlabeled autoencoder, which is based on positive-unlabeled learning and the anomaly detector such as the autoencoder. With our approach, we can approximate the anomaly scores for normal data using the unlabeled and anomaly data. Therefore, without the labeled normal data, we can train the anomaly detector to minimize the anomaly scores for normal data, and to maximize those for the anomaly data. In addition, our approach is applicable to various anomaly detectors such as the DeepSVDD. Experiments on various datasets show that our approach achieves better detection performance than existing approaches.
翻译:半监督异常检测旨在通过利用少量异常数据以及未标记数据来提升异常检测器的性能,已引起广泛关注。现有的半监督方法通常假设未标记数据主要为正常数据。这些方法训练异常检测器以最小化未标记数据的异常分数,并最大化异常数据的异常分数。然而,在实际应用中,未标记数据常被异常数据污染。这会削弱最大化异常数据异常分数的效果,从而阻碍检测性能的提升。为解决此问题,我们提出基于正例-未标记学习与自编码器等异常检测器的正例-未标记自编码器。通过该方法,我们可以利用未标记数据与异常数据近似估计正常数据的异常分数。因此,无需标记的正常数据,即可训练异常检测器以最小化正常数据的异常分数,并最大化异常数据的异常分数。此外,该方法可适用于多种异常检测器(如DeepSVDD)。在不同数据集上的实验表明,我们的方法相较于现有方法实现了更优的检测性能。