Time series clustering is the act of grouping time series data without recourse to a label. Algorithms that cluster time series can be classified into two groups: those that employ a time series specific distance measure; and those that derive features from time series. Both approaches usually rely on traditional clustering algorithms such as $k$-means. Our focus is on distance based time series that employ elastic distance measures, i.e. distances that perform some kind of realignment whilst measuring distance. We describe nine commonly used elastic distance measures and compare their performance with k-means and k-medoids clustering. Our findings are surprising. The most popular technique, dynamic time warping (DTW), performs worse than Euclidean distance with k-means, and even when tuned, is no better. Using k-medoids rather than k-means improved the clusterings for all nine distance measures. DTW is not significantly better than Euclidean distance with k-medoids. Generally, distance measures that employ editing in conjunction with warping perform better, and one distance measure, the move-split-merge (MSM) method, is the best performing measure of this study. We also compare to clustering with DTW using barycentre averaging (DBA). We find that DBA does improve DTW k-means, but that the standard DBA is still worse than using MSM. Our conclusion is to recommend MSM with k-medoids as the benchmark algorithm for clustering time series with elastic distance measures. We provide implementations in the aeon toolkit, results and guidance on reproducing results on the associated GitHub repository.
翻译:时间序列聚类是指在无标签的情况下对时间序列数据进行分组。聚类时间序列的算法可分为两类:一类采用时间序列特定的距离度量,另一类从时间序列中提取特征。两种方法通常依赖于传统聚类算法,如$k$-均值。本文聚焦于基于距离的时间序列聚类方法,这些方法采用弹性距离度量,即在测量距离时进行某种形式的对齐。我们描述了九种常用的弹性距离度量,并比较了它们在$k$-均值和$k$-中心点聚类中的性能。我们的研究发现令人惊讶:最流行的动态时间规整(DTW)在$k$-均值聚类中的表现甚至不如欧氏距离,即便经过调参也没有明显改进。与$k$-均值相比,所有九种距离度量在$k$-中心点聚类中的效果均有所提升。DTW在$k$-中心点下并未显著优于欧氏距离。总体而言,结合编辑与规整的距离度量表现更优,其中移动-分割-合并(MSM)方法是本研究中性能最好的度量。我们还比较了采用重心平均(DBA)的DTW聚类方法,发现DBA确实改进了DTW的$k$-均值聚类,但标准DBA仍不如MSM。我们的结论是,推荐将MSM与$k$-中心点作为使用弹性距离度量进行时间序列聚类的基准算法。我们在aeon工具包中提供了实现、结果以及相关GitHub仓库中复现结果的指南。