Quantifying similarities between time series in a meaningful way remains a challenge in time series analysis, despite many advances in the field. Most real-world solutions still rely on a few popular measures, such as Euclidean Distance (EuD), Longest Common Subsequence (LCSS), and Dynamic Time Warping (DTW). The strengths and weaknesses of these measures have been studied extensively, and incremental improvements have been proposed. In this study, however, we present a different similarity measure that fuses the notion of Dubuc's variation from fractal analysis with the Intersection-over-Union (IoU) measure which is widely used in object recognition (also known as the Jaccard Index). In this proof-of-concept paper, we introduce the Multiscale Dubuc Distance (MDD) measure and prove that it is a metric, possessing desirable properties such as the triangle inequality. We use 95 datasets from the UCR Time Series Classification Archive to compare MDD's performance with EuD, LCSS, and DTW. Our experiments show that MDD's overall success, without any case-specific customization, is comparable to DTW with optimized window sizes per dataset. We also highlight several datasets where MDD's performance improves significantly when its single parameter is customized. This customization serves as a powerful tool for gauging MDD's sensitivity to noise. Lastly, we show that MDD's running time is linear in the length of the time series, which is crucial for real-world applications involving very large datasets.
翻译:在时间序列分析中,如何以有意义的方式量化时间序列之间的相似性仍然是一个挑战,尽管该领域已取得诸多进展。大多数现实世界的解决方案仍依赖于少数几种常用度量方法,例如欧氏距离(EuD)、最长公共子序列(LCSS)和动态时间规整(DTW)。这些方法的优缺点已被广泛研究,并已提出渐进式改进。然而,在本研究中,我们提出了一种不同的相似性度量方法,该方法将分形分析中的杜比克变差概念与目标识别中广泛使用的交并比(IoU)度量(也称为杰卡德指数)相融合。在这篇概念验证论文中,我们介绍了多尺度杜比克距离(MDD)度量,并证明其是一种度量标准,具有三角不等式等理想性质。我们使用来自UCR时间序列分类档案库的95个数据集,将MDD的性能与EuD、LCSS和DTW进行比较。实验表明,在没有任何针对特定案例的定制情况下,MDD的整体性能与针对每个数据集优化窗口大小的DTW相当。我们还重点介绍了几个数据集,当MDD的单一参数经过定制后,其性能显著提升。这种定制可作为评估MDD对噪声敏感性的有力工具。最后,我们证明MDD的运行时间与时间序列的长度呈线性关系,这对于涉及超大规模数据集的现实应用至关重要。