We introduce ASTRIDE (Adaptive Symbolization for Time seRIes DatabasEs), a novel symbolic representation of time series, along with its accelerated variant FASTRIDE (Fast ASTRIDE). Unlike most symbolization procedures, ASTRIDE is adaptive during both the segmentation step by performing change-point detection and the quantization step by using quantiles. Instead of proceeding signal by signal, ASTRIDE builds a dictionary of symbols that is common to all signals in a data set. We also introduce D-GED (Dynamic General Edit Distance), a novel similarity measure on symbolic representations based on the general edit distance. We demonstrate the performance of the ASTRIDE and FASTRIDE representations compared to SAX (Symbolic Aggregate approXimation), 1d-SAX, SFA (Symbolic Fourier Approximation), and ABBA (Adaptive Brownian Bridge-based Aggregation) on reconstruction and, when applicable, on classification tasks. These algorithms are evaluated on 86 univariate equal-size data sets from the UCR Time Series Classification Archive. An open source GitHub repository called astride is made available to reproduce all the experiments in Python.
翻译:我们提出ASTRIDE(时间序列数据库的自适应符号化)——一种新型时间序列符号化表示方法,及其加速变体FASTRIDE(快速ASTRIDE)。与大多数符号化流程不同,ASTRIDE在分段步骤中通过执行变点检测实现自适应,在量化步骤中通过分位数实现自适应。ASTRIDE并非逐一对信号进行处理,而是构建一个数据集内所有信号共用的符号字典。我们还引入D-GED(动态通用编辑距离),这是一种基于通用编辑距离的新型符号化表示相似性度量。我们展示了ASTRIDE与FASTRIDE表示方法相较于SAX(符号聚合近似)、1d-SAX、SFA(符号傅里叶近似)和ABBA(自适应布朗桥聚合)在重构任务与分类任务上的性能表现。这些算法在UCR时间序列分类归档中的86个单变量等长数据集上进行了评估。我们提供了一个名为astride的开源GitHub仓库,可用于在Python中复现所有实验。