The variety of complex algorithmic approaches for tackling time-series classification problems has grown considerably over the past decades, including the development of sophisticated but challenging-to-interpret deep-learning-based methods. But without comparison to simpler methods it can be difficult to determine when such complexity is required to obtain strong performance on a given problem. Here we evaluate the performance of an extremely simple classification approach -- a linear classifier in the space of two simple features that ignore the sequential ordering of the data: the mean and standard deviation of time-series values. Across a large repository of 128 univariate time-series classification problems, this simple distributional moment-based approach outperformed chance on 69 problems, and reached 100% accuracy on two problems. With a neuroimaging time-series case study, we find that a simple linear model based on the mean and standard deviation performs better at classifying individuals with schizophrenia than a model that additionally includes features of the time-series dynamics. Comparing the performance of simple distributional features of a time series provides important context for interpreting the performance of complex time-series classification models, which may not always be required to obtain high accuracy.
翻译:过去几十年中,用于解决时间序列分类问题的复杂算法方法种类显著增多,包括开发复杂但难以解释的基于深度学习的方法。然而,若不与更简单的方法进行比较,很难确定在特定问题上是否需要这种复杂性才能获得强大性能。本文评估了一种极其简单的分类方法——在忽略数据顺序的两种简单特征(时间序列值的均值和标准差)空间中的线性分类器——的性能。在包含128个单变量时间序列分类问题的大型存储库中,这种基于简单分布矩的方法在69个问题上优于随机水平,并在两个问题上达到了100%的准确率。通过神经影像时间序列案例研究,我们发现基于均值和标准差的简单线性模型在分类精神分裂症患者方面,效果优于额外包含时间序列动态特征的模型。比较时间序列的简单分布特征性能,为解释复杂时间序列分类模型的性能提供了重要背景,而高准确性并不总是需要这些复杂模型。