For multivariate time series driven by underlying states, hidden Markov models (HMMs) constitute a powerful framework which can be flexibly tailored to the situation at hand. However, in practice it can be challenging to choose an adequate emission distribution for multivariate observation vectors. For example, the marginal data distribution may not immediately reveal the within-state distributional form, and also the different data streams may operate on different supports, rendering the common approach of using a multivariate normal distribution inadequate. Here we explore a nonparametric estimation of the emission distributions within a multivariate HMM based on tensor-product B-splines. In two simulation studies, we show the feasibility of our modelling approach and demonstrate potential pitfalls of inappropriate choices of parametric distributions. To illustrate the practical applicability, we present a case study where we use an HMM to model the bivariate time series comprising the lengths and angles of goalkeeper passes during the UEFA EURO 2020, investigating the effect of match dynamics on the teams' tactics.
翻译:对于由潜在状态驱动的多元时间序列,隐马尔可夫模型(HMMs)构成了一种强大的框架,可灵活适配具体情境。然而在实践中,为多元观测向量选择合适的发射分布往往具有挑战性。例如,边际数据分布可能无法直接揭示状态内的分布形式,且不同数据流可能在不同支撑集上运行,导致使用多元正态分布的常规方法存在局限。本文探索了一种基于张量积B样条的多元HMM发射分布非参数估计方法。通过两项仿真研究,我们验证了建模方法的可行性,并揭示了参数分布选择不当的潜在风险。为展示实际应用价值,我们以2020年欧洲杯足球赛中守门员传球长度与角度构成的二元时间序列为案例,利用HMM建模研究比赛动态对球队战术的影响。