Streaming Anomaly Detection

Anomaly detection is critical for finding suspicious behavior in innumerable systems. We need to detect anomalies in real-time, i.e. determine if an incoming entity is anomalous or not, as soon as we receive it, to minimize the effects of malicious activities and start recovery as soon as possible. Therefore, online algorithms that can detect anomalies in a streaming manner are essential. We first propose MIDAS which uses a count-min sketch to detect anomalous edges in dynamic graphs in an online manner, using constant time and memory. We then propose two variants, MIDAS-R which incorporates temporal and spatial relations, and MIDAS-F which aims to filter away anomalous edges to prevent them from negatively affecting the internal data structures. We then extend the count-min sketch to a Higher-Order sketch to capture complex relations in graph data, and to reduce detecting suspicious dense subgraph problem to finding a dense submatrix in constant time. Using this sketch, we propose four streaming methods to detect edge and subgraph anomalies. Next, we broaden the graph setting to multi-aspect data. We propose MStream which detects explainable anomalies in multi-aspect data streams. We further propose MStream-PCA, MStream-IB, and MStream-AE to incorporate correlation between features. Finally, we consider multi-dimensional data streams with concept drift and propose MemStream. MemStream leverages the power of a denoising autoencoder to learn representations and a memory module to learn the dynamically changing trend in data without the need for labels. We prove a theoretical bound on the size of memory for effective drift handling. In addition, we allow quick retraining when the arriving stream becomes sufficiently different from the training data. Furthermore, MemStream makes use of two architecture design choices to be robust to memory poisoning.

翻译：异常检测对于在无数系统中发现可疑行为至关重要。我们需要实时检测异常，即在接收到流入数据后立即判断其是否异常，以最小化恶意活动的影响并尽快启动恢复。因此，能够以流式方式检测异常的在线算法至关重要。我们首先提出MIDAS，它利用计数最小草图以常数时间和内存开销在线检测动态图中的异常边。随后提出两个变体：融合时空关系的MIDAS-R，以及旨在过滤异常边以防止其对内部数据结构产生负面影响的MIDAS-F。接着，我们将计数最小草图扩展为高阶草图，以捕获图数据中的复杂关系，并将检测可疑密集子图问题简化为在常数时间内寻找密集子矩阵。基于此草图，我们提出四种流式方法来检测边异常和子图异常。继而，我们将图设置拓展至多模态数据，提出MStream以检测多模态数据流中的可解释异常，并进一步提出MStream-PCA、MStream-IB和MStream-AE以纳入特征间相关性。最后，针对存在概念漂移的多维数据流，我们提出MemStream。MemStream利用去噪自编码器的能力学习表征，并借助记忆模块在无需标签的情况下学习数据的动态变化趋势。我们证明了有效处理漂移所需记忆容量的理论界限，并允许在到达流与训练数据差异显著时快速重新训练。此外，MemStream采用两种架构设计选择以增强对记忆中毒攻击的鲁棒性。