The matrix profile (MP) is a data structure computed from a time series which encodes the data required to locate motifs and discords, corresponding to recurring patterns and outliers respectively. When the time series contains noisy data then the conventional approach is to pre-filter it in order to remove noise but this cannot apply in unsupervised settings where patterns and outliers are not annotated. The resilience of the algorithm used to generate the MP when faced with noisy data remains unknown. We measure the similarities between the MP from original time series data with MPs generated from the same data with noisy data added under a range of parameter settings including adding duplicates and adding irrelevant data. We use three real world data sets drawn from diverse domains for these experiments Based on dissimilarities between the MPs, our results suggest that MP generation is resilient to a small amount of noise being introduced into the data but as the amount of noise increases this resilience disappears
翻译:矩阵轮廓(MP)是一种从时间序列中计算得出的数据结构,它编码了定位基序和异常点所需的数据,分别对应重复出现的模式与离群值。当时间序列包含噪声数据时,传统方法是预先对其进行滤波以去除噪声,但在无监督场景下(其中模式和异常点未经标注)此方法无法适用。用于生成MP的算法在面对噪声数据时的鲁棒性仍属未知。我们通过对比原始时间序列数据生成的MP与同一数据在多种参数设置(包括添加重复数据和无关数据)下添加噪声后生成的MP之间的相似性进行度量。基于来自不同领域的三个真实数据集进行的实验结果表明,MP生成对少量数据噪声具有鲁棒性,但随着噪声增加,这种鲁棒性消失。