Recently, order-preserving pattern (OPP) mining, a new sequential pattern mining method, has been proposed to mine frequent relative orders in a time series. Although frequent relative orders can be used as features to classify a time series, the mined patterns do not reflect the differences between two classes of time series well. To effectively discover the differences between time series, this paper addresses the top-k contrast OPP (COPP) mining and proposes a COPP-Miner algorithm to discover the top-k contrast patterns as features for time series classification, avoiding the problem of improper parameter setting. COPP-Miner is composed of three parts: extreme point extraction to reduce the length of the original time series, forward mining, and reverse mining to discover COPPs. Forward mining contains three steps: group pattern fusion strategy to generate candidate patterns, the support rate calculation method to efficiently calculate the support of a pattern, and two pruning strategies to further prune candidate patterns. Reverse mining uses one pruning strategy to prune candidate patterns and consists of applying the same process as forward mining. Experimental results validate the efficiency of the proposed algorithm and show that top-k COPPs can be used as features to obtain a better classification performance.
翻译:近期,一种新的序列模式挖掘方法——保序模式(OPP)挖掘被提出,用于挖掘时间序列中频繁出现的相对顺序关系。尽管频繁相对顺序可作为特征对时间序列进行分类,但所挖掘的模式并不能充分反映两类时间序列之间的差异。为有效发现时间序列间的差异,本文研究了top-k对比保序模式(COPP)挖掘问题,并提出了一种COPP-Miner算法,用于发现作为时间序列分类特征的top-k对比模式,从而避免参数设置不当的问题。COPP-Miner由三部分组成:通过极值点提取降低原始时间序列长度、正向挖掘和反向挖掘来发现COPPs。正向挖掘包含三个步骤:基于分组模式融合策略生成候选模式、利用支持率计算方法高效计算模式支持度、以及采用两种剪枝策略进一步剪枝候选模式。反向挖掘采用一种剪枝策略对候选模式进行剪枝,并应用与正向挖掘相同的过程。实验结果验证了所提算法的有效性,并表明top-k COPPs可作为特征以获得更优的分类性能。