Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with dimensionality of a multidimensional time series. In this work, we propose a sketch for discord mining among multi-dimensional time series. After an initial pre-processing of the sketch as fast as reading the data, the discord mining has runtime independent of the dimensionality of the original data. On several real world examples from water treatment and transportation, the proposed algorithm improves the throughput by at least an order of magnitude (50X) and only has minimal impact on the quality of the approximated solution. Additionally, the proposed method can handle the dynamic addition or deletion of dimensions inconsequential overhead. This allows a data analyst to consider "what-if" scenarios in real time while exploring the data.
翻译:时间序列异常模式是时间序列异常检测的有效基础方法,而矩阵剖面能够有效捕捉异常模式。已有大量研究致力于提升异常模式发现算法在时间序列长度方面的可扩展性。然而,针对多维时间序列中与维度相关的矩阵剖面计算时间复杂度优化问题,相关研究却出人意料地稀少。本文提出了一种面向多维时间序列异常模式挖掘的草图方法。在完成与数据读取速度相当的初始草图预处理后,异常模式挖掘的运行时间与原始数据的维度无关。在水处理和交通领域的多个真实案例中,所提算法将吞吐量提升至少一个数量级(50倍),且对近似解质量的影响极小。此外,该方法能够以极低的额外开销处理维度的动态增减,使数据分析人员能够在探索数据时实时考虑"假设分析"场景。