Current methods for pattern analysis in time series mainly rely on statistical features or probabilistic learning and inference methods to identify patterns and trends in the data. Such methods do not generalize well when applied to multivariate, multi-source, state-varying, and noisy time-series data. To address these issues, we propose a highly generalizable method that uses information theory-based features to identify and learn from patterns in multivariate time-series data. To demonstrate the proposed approach, we analyze pattern changes in human activity data. For applications with stochastic state transitions, features are developed based on Shannon's entropy of Markov chains, entropy rates of Markov chains, entropy production of Markov chains, and von Neumann entropy of Markov chains. For applications where state modeling is not applicable, we utilize five entropy variants, including approximate entropy, increment entropy, dispersion entropy, phase entropy, and slope entropy. The results show the proposed information theory-based features improve the recall rate, F1 score, and accuracy on average by up to 23.01% compared with the baseline models and a simpler model structure, with an average reduction of 18.75 times in the number of model parameters.
翻译:当前时序数据模式分析方法主要依赖统计特征或概率学习推理方法来识别数据中的模式和趋势,但这些方法在处理多变量、多源、状态变化及含噪声的时序数据时泛化能力不足。为解决此问题,我们提出一种高度可泛化的方法,利用基于信息论的特征来识别和学习多变量时序数据中的模式。为展示所提方法,我们分析了人类活动数据中的模式变化。针对存在随机状态转移的应用场景,我们基于马尔可夫链的香农熵、熵率、熵产生及冯·诺依曼熵开发了特征。对于不适用于状态建模的应用场景,我们采用了五种熵变体,包括近似熵、增量熵、散布熵、相位熵和斜率熵。实验结果表明,与基线模型相比,所提出的基于信息论的特征在平均召回率、F1分数和准确率上最高提升了23.01%,同时采用了更简单的模型结构,模型参数数量平均减少18.75倍。