Time series discords are a useful primitive for time series anomaly detection, and the matrix profile is capable of capturing discord effectively. There exist many research efforts to improve the scalability of discord discovery with respect to the length of time series. However, there is surprisingly little work focused on reducing the time complexity of matrix profile computation associated with dimensionality of a multidimensional time series. In this work, we propose a sketch for discord mining among multi-dimensional time series. After an initial pre-processing of the sketch as fast as reading the data, the discord mining has runtime independent of the dimensionality of the original data. On several real world examples from water treatment and transportation, the proposed algorithm improves the throughput by at least an order of magnitude (50X) and only has minimal impact on the quality of the approximated solution. Additionally, the proposed method can handle the dynamic addition or deletion of dimensions inconsequential overhead. This allows a data analyst to consider "what-if" scenarios in real time while exploring the data.
翻译:时间序列异常模式是时间序列异常检测的重要基础方法,矩阵轮廓能够有效捕捉异常模式。目前已有大量研究致力于提高异常模式发现算法在时间序列长度上的可扩展性,但令人意外的是,针对多维时间序列维度所引发的矩阵轮廓计算时间复杂度优化问题,相关研究却极为匮乏。本文提出了一种面向多维时间序列异常模式挖掘的草图方法。在完成草图初始预处理(其速度仅需读取数据的时间)后,异常模式挖掘的运行时间不再受原始数据维度的影响。在水处理和交通运输领域的多个实际案例中,所提算法将吞吐量提升至少一个数量级(50倍),且对近似解质量影响极小。此外,该方法能够在不产生显著额外开销的情况下处理维度的动态增删,使数据分析人员能够在实时数据探索中考虑"假设分析"场景。