Finite order Markov models are theoretically well-studied models for dependent discrete data. Despite their generality, application in empirical work when the order is large is rare. Practitioners avoid using higher order Markov models because (1) the number of parameters grow exponentially with the order and (2) the interpretation is often difficult. Mixture of transition distribution models (MTD) were introduced to overcome both limitations. MTD represent higher order Markov models as a convex mixture of single step Markov chains, reducing the number of parameters and increasing the interpretability. Nevertheless, in practice, estimation of MTD models with large orders are still limited because of curse of dimensionality and high algorithm complexity. Here, we prove that if only few lags are relevant we can consistently and efficiently recover the lags and estimate the transition probabilities of high-dimensional MTD models. The key innovation is a recursive procedure for the selection of the relevant lags of the model. Our results are based on (1) a new structural result of the MTD and (2) an improved martingale concentration inequality. We illustrate our method using simulations and a weather data.
翻译:有限阶马尔可夫模型是理论上得到充分研究的相依离散数据模型。尽管其具有普遍性,但在实际工作中当阶数较大时应用却很少。实践者避免使用高阶马尔可夫模型,原因在于:(1)参数数量随阶数呈指数增长;(2)解释往往较为困难。为克服这两项局限性,引入了转移分布混合模型(MTD)。MTD将高阶马尔可夫模型表示为单步马尔可夫链的凸组合,从而减少参数数量并提高可解释性。然而,在实际应用中,由于维数灾难和算法复杂度高,大阶数MTD模型的估计仍然受到限制。本文证明,若仅少数滞后项相关,我们能够一致且高效地恢复这些滞后项并估计高维MTD模型的转移概率。关键创新在于提出了一种递归过程用于选择模型的相关滞后项。我们的结果基于:(1)MTD的一个新结构性质,以及(2)改进的鞅集中不等式。我们通过模拟实验和天气数据验证了所提方法。