Time Series Classification (TSC) has received much attention in the past two decades and is still a crucial and challenging problem in data science and knowledge engineering. Indeed, along with the increasing availability of time series data, many TSC algorithms have been suggested by the research community in the literature. Besides state-of-the-art methods based on similarity measures, intervals, shapelets, dictionaries, deep learning methods or hybrid ensemble methods, several tools for extracting unsupervised informative summary statistics, aka features, from time series have been designed in the recent years. Originally designed for descriptive analysis and visualization of time series with informative and interpretable features, very few of these feature engineering tools have been benchmarked for TSC problems and compared with state-of-the-art TSC algorithms in terms of predictive performance. In this article, we aim at filling this gap and propose a simple TSC process to evaluate the potential predictive performance of the feature sets obtained with existing feature engineering tools. Thus, we present an empirical study of 11 feature engineering tools branched with 9 supervised classifiers over 112 time series data sets. The analysis of the results of more than 10000 learning experiments indicate that feature-based methods perform as accurately as current state-of-the-art TSC algorithms, and thus should rightfully be considered further in the TSC literature.
翻译:时间序列分类(TSC)在过去二十年中备受关注,至今仍是数据科学与知识工程领域的关键性挑战问题。事实上,随着时间序列数据可用性的日益提升,已有大量TSC算法被学术界文献所提出。除基于相似性度量、区间、形状子序列、字典、深度学习方法或混合集成方法的最新技术外,近年来还涌现出多种从时间序列中提取非监督性总结统计信息(即特征)的工具。这些特征工程工具最初旨在通过信息丰富且可解释的特征对时间序列进行描述性分析与可视化,但很少有研究在TSC问题框架下对这些工具进行基准测试,并将其预测性能与现有最先进TSC算法进行比较。本文旨在填补这一空白,提出一个简单的TSC流程,用于评估现有特征工程工具所得特征集的潜在预测性能。为此,我们在112个时间序列数据集上,对9种监督分类器与11种特征工程工具的组合进行了实证研究。对超过10,000次学习实验的结果分析表明,基于特征的方法能达到与当前最先进TSC算法相当的预测精度,因此应在TSC文献中得到更合理的关注与进一步研究。