Performance benchmarking is a crucial component of time series classification (TSC) algorithm design, and a fast-growing number of datasets have been established for empirical benchmarking. However, the empirical benchmarks are costly and do not guarantee statistical optimality. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is optimal in the sense of the Neyman-Pearson lemma: it has the smallest false positive rate among classifiers with a controlled level of false negative rate. The LRT requires the likelihood ratio of the time series to be computable. The diffusion processes from stochastic differential equations provide such time series and are flexible in design for generating linear or nonlinear time series. We demonstrate the benchmarking with three scalable state-of-the-art TSC algorithms: random forest, ResNet, and ROCKET. Test results show that they can achieve LRT optimality for univariate time series and multivariate Gaussian processes. However, these model-agnostic algorithms are suboptimal in classifying nonlinear multivariate time series from high-dimensional stochastic interacting particle systems. Additionally, the LRT benchmark provides tools to analyze the dependence of classification accuracy on the time length, dimension, temporal sampling frequency, and randomness of the time series. Thus, the LRT with diffusion processes can systematically and efficiently benchmark the optimality of TSC algorithms and may guide their future improvements.
翻译:性能基准测试是时间序列分类(TSC)算法设计的关键环节,且为实证基准测试建立的数据库数量正快速增长。然而,实证基准测试成本高昂且无法保证统计最优性。本研究提出利用似然比检验(LRT)对TSC算法区分扩散过程的最优性进行基准测试。依据内曼-皮尔逊引理,LRT具有最优性:在控制假阴性率水平的分类器中,其假阳性率最小。LRT要求时间序列的似然比可计算。随机微分方程描述的扩散过程提供了此类时间序列,且其设计灵活,可生成线性或非线性时间序列。我们通过三种可扩展的先进TSC算法(随机森林、ResNet和ROCKET)展示了基准测试过程。测试结果表明,这些算法对单变量时间序列和多变量高斯过程可实现LRT最优性。然而,在分类来自高维随机相互作用粒子系统的非线性多变量时间序列时,这些模型无关算法表现次优。此外,LRT基准测试提供了分析时间序列长度、维度、时域采样频率及随机性对分类精度依赖关系的工具。因此,结合扩散过程的LRT能系统高效地基准测试TSC算法的最优性,并可能指导其未来改进。