Current methods for pattern analysis in time series mainly rely on statistical features or probabilistic learning and inference methods to identify patterns and trends in the data. Such methods do not generalize well when applied to multivariate, multi-source, state-varying, and noisy time-series data. To address these issues, we propose a highly generalizable method that uses information theory-based features to identify and learn from patterns in multivariate time-series data. To demonstrate the proposed approach, we analyze pattern changes in human activity data. For applications with stochastic state transitions, features are developed based on Shannon's entropy of Markov chains, entropy rates of Markov chains, entropy production of Markov chains, and von Neumann entropy of Markov chains. For applications where state modeling is not applicable, we utilize five entropy variants, including approximate entropy, increment entropy, dispersion entropy, phase entropy, and slope entropy. The results show the proposed information theory-based features improve the recall rate, F1 score, and accuracy on average by up to 23.01\% compared with the baseline models and a simpler model structure, with an average reduction of 18.75 times in the number of model parameters.
翻译:当前时序数据模式分析方法主要依赖统计特征或概率学习推理方法识别数据中的模式与趋势。这类方法在应对多变量、多源、状态变化及含噪时序数据时泛化能力不足。为解决上述问题,我们提出一种高泛化性方法,通过基于信息论的特征来识别并学习多元时序数据中的模式。为验证该方法,我们分析了人类活动数据中的模式变化。针对随机状态转移的应用场景,基于马尔可夫链的香农熵、熵率、熵产生率及冯·诺依曼熵构建特征;对于不适用状态建模的场景,则采用近似熵、增量熵、分散熵、相位熵和斜率熵五种熵变体。实验结果表明,与基线模型及更简化的模型结构相比,所提出的信息论特征在召回率、F1分数和准确率上平均提升高达23.01%,同时模型参数数量平均减少18.75倍。