In 2017, a research paper compared 18 Time Series Classification (TSC) algorithms on 85 datasets from the University of California, Riverside (UCR) archive. This study, commonly referred to as a `bake off', identified that only nine algorithms performed significantly better than the Dynamic Time Warping (DTW) and Rotation Forest benchmarks that were used. The study categorised each algorithm by the type of feature they extract from time series data, forming a taxonomy of five main algorithm types. This categorisation of algorithms alongside the provision of code and accessible results for reproducibility has helped fuel an increase in popularity of the TSC field. Over six years have passed since this bake off, the UCR archive has expanded to 112 datasets and there have been a large number of new algorithms proposed. We revisit the bake off, seeing how each of the proposed categories have advanced since the original publication, and evaluate the performance of newer algorithms against the previous best-of-category using an expanded UCR archive. We extend the taxonomy to include three new categories to reflect recent developments. Alongside the originally proposed distance, interval, shapelet, dictionary and hybrid based algorithms, we compare newer convolution and feature based algorithms as well as deep learning approaches. We introduce 30 classification datasets either recently donated to the archive or reformatted to the TSC format, and use these to further evaluate the best performing algorithm from each category. Overall, we find that two recently proposed algorithms, Hydra+MultiROCKET and HIVE-COTEv2, perform significantly better than other approaches on both the current and new TSC problems.
翻译:2017年,一篇研究论文在加州大学河滨分校(UCR)数据库的85个数据集上比较了18种时间序列分类(TSC)算法。这项被称为“烘焙竞赛”的研究发现,仅有9种算法显著优于所采用的动态时间规整(DTW)和旋转森林基准方法。该研究依据算法从时序数据中提取特征的类型,将其划分为五大类算法,形成了分类体系。这种算法分类方式,连同可复现的代码和可访问结果的提供,推动了TSC领域的热度提升。自该烘焙竞赛以来已过去六年有余,UCR数据库已扩展至112个数据集,且涌现了大量新算法。我们重新审视这场烘焙竞赛,考察原分类体系中各类算法自原论文发表以来的进展,并利用扩展后的UCR数据库评估新算法相较于原类别最优方法的性能。我们将分类体系扩展至三个新类别以反映最新发展。在原始提出的距离型、区间型、shapelet型、字典型及混合型算法基础上,我们新增比较了卷积型、特征型算法以及深度学习方法。我们引入了30个分类数据集(这些数据或为新近捐赠至数据库,或已按TSC格式重新整理),并以此进一步评估各类别中表现最优的算法。总体而言,我们发现两种近期提出的算法——Hydra+MultiROCKET和HIVE-COTEv2——在现有及新增的TSC问题上均显著优于其他方法。