Automatic Feature Engineering for Time Series Classification: Evaluation and Discussion

Time Series Classification (TSC) has received much attention in the past two decades and is still a crucial and challenging problem in data science and knowledge engineering. Indeed, along with the increasing availability of time series data, many TSC algorithms have been suggested by the research community in the literature. Besides state-of-the-art methods based on similarity measures, intervals, shapelets, dictionaries, deep learning methods or hybrid ensemble methods, several tools for extracting unsupervised informative summary statistics, aka features, from time series have been designed in the recent years. Originally designed for descriptive analysis and visualization of time series with informative and interpretable features, very few of these feature engineering tools have been benchmarked for TSC problems and compared with state-of-the-art TSC algorithms in terms of predictive performance. In this article, we aim at filling this gap and propose a simple TSC process to evaluate the potential predictive performance of the feature sets obtained with existing feature engineering tools. Thus, we present an empirical study of 11 feature engineering tools branched with 9 supervised classifiers over 112 time series data sets. The analysis of the results of more than 10000 learning experiments indicate that feature-based methods perform as accurately as current state-of-the-art TSC algorithms, and thus should rightfully be considered further in the TSC literature.

翻译：时间序列分类（TSC）在过去二十年中备受关注，至今仍是数据科学与知识工程领域的关键性挑战问题。事实上，随着时间序列数据可用性的日益提升，已有大量TSC算法被学术界文献所提出。除基于相似性度量、区间、形状子序列、字典、深度学习方法或混合集成方法的最新技术外，近年来还涌现出多种从时间序列中提取非监督性总结统计信息（即特征）的工具。这些特征工程工具最初旨在通过信息丰富且可解释的特征对时间序列进行描述性分析与可视化，但很少有研究在TSC问题框架下对这些工具进行基准测试，并将其预测性能与现有最先进TSC算法进行比较。本文旨在填补这一空白，提出一个简单的TSC流程，用于评估现有特征工程工具所得特征集的潜在预测性能。为此，我们在112个时间序列数据集上，对9种监督分类器与11种特征工程工具的组合进行了实证研究。对超过10,000次学习实验的结果分析表明，基于特征的方法能达到与当前最先进TSC算法相当的预测精度，因此应在TSC文献中得到更合理的关注与进一步研究。

相关内容

TSC

关注 0

服务范围涵盖服务创新研发的所有计算和软件科学技术方面。IEEE服务计算事务强调算法、数学、统计和计算方法，这些方法是服务计算的核心，是面向服务的体系结构、Web服务、业务流程集成、解决方案性能管理、服务操作和管理的新兴领域。官网地址：http://dblp.uni-trier.de/db/journals/tsc/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日