The increasing availability of temporal data poses a challenge to time-series and signal-processing domains due to its high numerosity and complexity. Symbolic representation outperforms raw data in a variety of engineering applications due to its storage efficiency, reduced numerosity, and noise reduction. The most recent symbolic aggregate approximation technique called ABBA demonstrates outstanding performance in preserving essential shape information of time series and enhancing the downstream applications. However, ABBA cannot handle multiple time series with consistent symbols, i.e., the same symbols from distinct time series are not identical. Also, working with appropriate ABBA digitization involves the tedious task of tuning the hyperparameters, such as the number of symbols or tolerance. Therefore, we present a joint symbolic aggregate approximation that has symbolic consistency, and show how the hyperparameter of digitization can itself be optimized alongside the compression tolerance ahead of time. Besides, we propose a novel computing paradigm that enables parallel computing of symbolic approximation. The extensive experiments demonstrate its superb performance and outstanding speed regarding symbolic approximation and reconstruction.
翻译:时间数据的日益普及因其高数量性和复杂性对时间序列和信号处理领域提出了挑战。由于符号表示在存储效率、降低数量性和减少噪声方面具有优势,因此在各种工程应用中优于原始数据。最新的符号聚合逼近技术ABBA在保留时间序列的关键形状信息和增强下游应用方面表现出卓越性能。然而,ABBA无法处理具有一致符号的多变量时间序列,即来自不同时间序列的相同符号并不相同。此外,使用合适的ABBA数字化涉及调整超参数(如符号数量或容差)的繁琐任务。因此,我们提出了一种具有符号一致性的联合符号聚合逼近方法,并展示了如何提前将数字化的超参数与压缩容差一同优化。此外,我们提出了一种新的计算范式,能够实现符号逼近的并行计算。大量实验证明了该方法在符号逼近和重构方面的卓越性能和出色速度。