Performance benchmarking is a crucial component of time series classification (TSC) algorithm design, and a fast-growing number of datasets have been established for empirical benchmarking. However, the empirical benchmarks are costly and do not guarantee statistical optimality. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is optimal in the sense of the Neyman-Pearson lemma: it has the smallest false positive rate among classifiers with a controlled level of false negative rate. The LRT requires the likelihood ratio of the time series to be computable. The diffusion processes from stochastic differential equations provide such time series and are flexible in design for generating linear or nonlinear time series. We demonstrate the benchmarking with three scalable state-of-the-art TSC algorithms: random forest, ResNet, and ROCKET. Test results show that they can achieve LRT optimality for univariate time series and multivariate Gaussian processes. However, these model-agnostic algorithms are suboptimal in classifying nonlinear multivariate time series from high-dimensional stochastic interacting particle systems. Additionally, the LRT benchmark provides tools to analyze the dependence of classification accuracy on the time length, dimension, temporal sampling frequency, and randomness of the time series. Thus, the LRT with diffusion processes can systematically and efficiently benchmark the optimality of TSC algorithms and may guide their future improvements.
翻译:性能基准测试是时间序列分类算法设计的关键组成部分,且已建立大量数据集用于经验性基准测试。然而,经验基准测试成本高昂且无法保证统计最优性。本研究提出通过似然比检验来基准测试区分扩散过程的时间序列分类算法的最优性。依据奈曼-皮尔逊引理,似然比检验具有最优性:在控制错误负率水平的条件下,其错误正率最小。似然比检验要求时间序列的似然比可计算。随机微分方程描述的扩散过程能提供此类时间序列,且其设计灵活,可生成线性或非线性时间序列。我们通过三种可扩展的先进时间序列分类算法(随机森林、ResNet和ROCKET)演示了该基准测试方法。测试结果显示,这些算法在单变量时间序列和多变量高斯过程上可达到似然比检验的最优性。然而,在分类高维随机交互粒子系统产生的非线性多变量时间序列时,这些模型无关算法表现非最优。此外,似然比检验基准提供了分析分类精度对时间长度、维度、时间采样频率及时间序列随机性依赖性的工具。因此,基于扩散过程的似然比检验能够系统高效地基准测试时间序列分类算法的最优性,并可能指导其未来改进。