In the contemporary digital landscape, the continuous generation of extensive streaming data across diverse domains has become pervasive. Yet, a significant portion of this data remains unlabeled, posing a challenge in identifying infrequent events such as anomalies. This challenge is further amplified in non-stationary environments, where the performance of models can degrade over time due to concept drift. To address these challenges, this paper introduces a new method referred to as VAE4AS (Variational Autoencoder for Anomalous Sequences). VAE4AS integrates incremental learning with dual drift detection mechanisms, employing both a statistical test and a distance-based test. The anomaly detection is facilitated by a Variational Autoencoder. To gauge the effectiveness of VAE4AS, a comprehensive experimental study is conducted using real-world and synthetic datasets characterized by anomalous rates below 10\% and recurrent drift. The results show that the proposed method surpasses both robust baselines and state-of-the-art techniques, providing compelling evidence for their efficacy in effectively addressing some of the challenges associated with anomalous sequence detection in non-stationary streaming data.
翻译:在当代数字环境下,跨领域持续生成大规模流式数据已成为普遍现象。然而,这些数据中大部分未经标注,给识别异常等罕见事件带来了挑战。这一挑战在非平稳环境中进一步加剧,由于概念漂移的存在,模型性能可能随时间推移而退化。为解决上述问题,本文提出一种名为VAE4AS(面向异常序列的变分自编码器)的新方法。VAE4AS将增量学习与双重漂移检测机制相结合,同时采用统计检验和基于距离的检验方法。异常检测功能由变分自编码器实现。为评估VAE4AS的有效性,我们使用异常率低于10%且存在周期性漂移的真实数据集与合成数据集开展了综合性实验研究。结果表明,所提方法不仅超越稳健基线模型,更优于当前最先进技术,为有效应对非平稳流式数据中异常序列检测的若干挑战提供了有力的实证依据。