Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification and clustering. Existing measures may fail to capture similarities due to local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in ensemble building and clustering of epidemic curves. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.
翻译:测量时间序列数据之间的距离或相似性是分类和聚类等众多应用中的基础问题。现有测量方法可能因局部趋势(形状)而无法捕捉相似性,甚至产生误导性结果。我们的目标是开发一种能够识别相似时间点附近相似趋势、且易于被应用领域研究者解释的测量方法。这对于时间序列包含一系列有意义且有序的局部趋势(如流行病学中的激增→增长→峰值→下降)的应用尤为重要。我们提出了一种新方法DTW+S,该方法为时间序列构建可解释的"保近性"矩阵表示,其中每列代表局部趋势,随后运用动态时间规整计算这些矩阵间的距离。我们从理论上论证了该表示选择的合理性。我们展示了DTW+S在流行病曲线集成构建与聚类中的应用效果,同时证明对于特定数据集,尤其在局部趋势而非尺度起决定性作用时,我们的方法相比动态时间规整能实现更优的分类性能。