A time series is a sequence of sequentially ordered real values in time. Time series classification (TSC) is the task of assigning a time series to one of a set of predefined classes, usually based on a model learned from examples. Dictionary-based methods for TSC rely on counting the frequency of certain patterns in time series and are important components of the currently most accurate TSC ensembles. One of the early dictionary-based methods was WEASEL, which at its time achieved SotA results while also being very fast. However, it is outperformed both in terms of speed and accuracy by other methods. Furthermore, its design leads to an unpredictably large memory footprint, making it inapplicable for many applications. In this paper, we present WEASEL 2.0, a complete overhaul of WEASEL based on two recent advancements in TSC: Dilation and ensembling of randomized hyper-parameter settings. These two techniques allow WEASEL 2.0 to work with a fixed-size memory footprint while at the same time improving accuracy. Compared to 15 other SotA methods on the UCR benchmark set, WEASEL 2.0 is significantly more accurate than other dictionary methods and not significantly worse than the currently best methods. Actually, it achieves the highest median accuracy over all data sets, and it performs best in 5 out of 12 problem classes. We thus believe that WEASEL 2.0 is a viable alternative for current TSC and also a potentially interesting input for future ensembles.
翻译:时间序列是按时间顺序排列的实数值序列。时间序列分类(TSC)是将时间序列分配至预定义类别之一的任务,通常基于从样本中学习的模型。基于字典的TSC方法依赖于统计时间序列中特定模式的出现频率,且是目前最精确的TSC集成方法的重要组成部分。早期基于字典的方法之一是WEASEL,它在当时取得了最优结果,同时运行速度极快。然而,它在速度和精度上均被其他方法超越。此外,其设计导致内存占用不可预测地增大,使其难以应用于诸多场景。本文提出WEASEL 2.0,这是基于TSC领域近期两项进展(膨胀与随机超参数设置的集成)对WEASEL的彻底重构。这两项技术使WEASEL 2.0在保持固定内存占用的同时提高了精度。在UCR基准数据集上与15种其他最优TSC方法相比,WEASEL 2.0的精度显著高于其他字典方法,且与当前最佳方法无显著差距。实际上,它在所有数据集上获得最高中位数精度,并在12个问题类别中的5个上表现最佳。因此,我们认为WEASEL 2.0是当前TSC的一个可行替代方案,且可能成为未来集成方法的潜在重要输入。