In today's digital world, the generation of vast amounts of streaming data in various domains has become ubiquitous. However, many of these data are unlabeled, making it challenging to identify events, particularly anomalies. This task becomes even more formidable in nonstationary environments where model performance can deteriorate over time due to concept drift. To address these challenges, this paper presents a novel method, VAE++ESDD, which employs incremental learning and two-level ensembling: an ensemble of Variational AutoEncoder(VAEs) for anomaly prediction, along with an ensemble of concept drift detectors. Each drift detector utilizes a statistical-based concept drift mechanism. To evaluate the effectiveness of VAE++ESDD, we conduct a comprehensive experimental study using real-world and synthetic datasets characterized by severely or extremely low anomalous rates and various drift characteristics. Our study reveals that the proposed method significantly outperforms both strong baselines and state-of-the-art methods.
翻译:在当今数字世界中,各领域海量流式数据的生成已无处不在。然而,这些数据大多缺乏标注,使得事件(尤其是异常事件)的识别颇具挑战。在非平稳环境中,由于概念漂移可能导致模型性能随时间退化,这一任务变得尤为艰巨。为应对这些挑战,本文提出了一种新颖方法VAE++ESDD,该方法采用增量学习与双层集成机制:第一层为用于异常预测的变分自编码器集成,第二层为概念漂移检测器集成。每个漂移检测器均采用基于统计的概念漂移机制。为评估VAE++ESDD的有效性,我们使用具有极低异常率及多种漂移特征的真实数据集与合成数据集开展了全面的实验研究。研究表明,所提方法在性能上显著优于现有强基线方法与前沿方法。