In the contemporary digital landscape, the continuous generation of extensive streaming data across diverse domains has become pervasive. Yet, a significant portion of this data remains unlabeled, posing a challenge in identifying infrequent events such as anomalies. This challenge is further amplified in non-stationary environments, where the performance of models can degrade over time due to concept drift. To address these challenges, this paper introduces a new method referred to as VAE4AS (Variational Autoencoder for Anomalous Sequences). VAE4AS integrates incremental learning with dual drift detection mechanisms, employing both a statistical test and a distance-based test. The anomaly detection is facilitated by a Variational Autoencoder. To gauge the effectiveness of VAE4AS, a comprehensive experimental study is conducted using real-world and synthetic datasets characterized by anomalous rates below 10\% and recurrent drift. The results show that the proposed method surpasses both robust baselines and state-of-the-art techniques, providing compelling evidence for their efficacy in effectively addressing some of the challenges associated with anomalous sequence detection in non-stationary streaming data.
翻译:在当今数字化背景下,跨领域海量流式数据的持续生成已成为普遍现象。然而,这些数据中绝大部分缺乏标注,导致难以识别如异常等低频事件。这一挑战在非平稳环境中进一步加剧——由于概念漂移的存在,模型性能会随时间推移逐步退化。针对上述问题,本文提出一种称为VAE4AS(面向异常序列的变分自编码器)的新方法。该方法将增量学习与双重漂移检测机制相结合,同时采用统计检验和距离检验两种策略。异常检测功能由变分自编码器实现。为评估VAE4AS的有效性,我们使用异常率低于10%且存在周期性漂移的真实数据集与合成数据集开展了系统性实验研究。结果表明,所提方法不仅超越了强基线模型,更优于当前最先进技术,有力证明了其在非平稳流式数据异常序列检测中应对关键挑战的有效性。