Currently, discovering subsequence anomalies in time series remains one of the most topical research problems. A subsequence anomaly refers to successive points in time that are collectively abnormal, although each point is not necessarily an outlier. Among a large number of approaches to discovering subsequence anomalies, the discord concept is considered one of the best. A time series discord is intuitively defined as a subsequence of a given length that is maximally far away from its non-overlapping nearest neighbor. Recently introduced the MERLIN algorithm discovers time series discords of every possible length in a specified range, thereby eliminating the need to set even that sole parameter to discover discords in a time series. However, MERLIN is serial and its parallelization could increase the performance of discords discovery. In this article, we introduce a novel parallelization scheme for GPUs, called PALMAD, Parallel Arbitrary Length MERLIN-based Anomaly Discovery. As opposed to its serial predecessor, PALMAD employs recurrent formulas we have derived to avoid redundant calculations, and advanced data structures for the efficient implementation of parallel processing. Experimental evaluation over real-world and synthetic time series shows that our algorithm outperforms parallel analogs. We also apply PALMAD to discover anomalies in a real-world time series employing our proposed discord heatmap technique to illustrate the results.
翻译:当前,时间序列中的子序列异常发现仍是最具前沿性的研究问题之一。子序列异常指时间序列中连续且整体异常的点集,尽管单个点未必是离群值。在众多子序列异常发现方法中,不一致性概念被认为是最优方案之一。直观而言,时间序列不一致性定义为给定长度的子序列,该子序列与其非重叠最近邻距离最大。近期提出的MERLIN算法可发现指定范围内所有可能长度的时间序列不一致性,从而消除了甚至需设置该单一参数来发现时间序列不一致性的需求。然而,MERLIN算法为串行实现,其并行化可提升不一致性发现性能。本文提出一种面向GPU的新型并行化方案PALMAD(基于MERLIN的任意长度并行异常发现)。与串行前身不同,PALMAD采用我们推导的递推公式以避免冗余计算,并运用先进数据结构实现高效的并行处理。在真实与合成时间序列上的实验评估表明,本算法性能优于同类并行算法。同时,我们应用PALMAD结合提出的不一致性热力图技术对真实时间序列进行异常发现与结果可视化。