Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with dimensionality of a multidimensional time series. In this work, we propose a sketch for discord mining among multi-dimensional time series. After an initial pre-processing of the sketch as fast as reading the data, the discord mining has runtime independent of the dimensionality of the original data. On several real world examples from water treatment and transportation, the proposed algorithm improves the throughput by at least an order of magnitude (50X) and only has minimal impact on the quality of the approximated solution. Additionally, the proposed method can handle the dynamic addition or deletion of dimensions inconsequential overhead. This allows a data analyst to consider "what-if" scenarios in real time while exploring the data.
翻译:时间序列离群模式是时间序列异常检测的重要基础,矩阵轮廓能有效捕获离群模式。现有研究多聚焦于提升离群发现算法在时间序列长度上的可扩展性,但针对多维时间序列维度增加导致的矩阵轮廓计算时间复杂度的优化研究却出人意料地匮乏。本文提出一种面向多维时间序列离群模式挖掘的草图方法。通过一次与数据读取速度相当的预处理后,离群挖掘的运行时间与原始数据维度无关。在水处理和交通运输领域的多个真实案例中,所提算法将吞吐量提升至少一个数量级(50倍),且对近似解的精度影响极小。此外,该方法能以极低开销处理维度的动态增减,使数据分析师能够在探索数据时实时分析"假设"场景。