Hidden Markov models (HMMs) are a versatile statistical framework commonly used in ecology to characterize behavioural patterns from animal movement data. In HMMs, the observed data depend on a finite number of underlying hidden states, generally interpreted as the animal's unobserved behaviour. The number of states is a crucial parameter, controlling the trade-off between ecological interpretability of behaviours (fewer states) and the goodness of fit of the model (more states). Selecting the number of states, commonly referred to as order selection, is notoriously challenging. Common model selection metrics, such as AIC and BIC, often perform poorly in determining the number of states, particularly when models are misspecified. Building on existing methods for HMMs and mixture models, we propose a double penalized likelihood maximum estimate (DPMLE) for the simultaneous estimation of the number of states and parameters of non-stationary HMMs. The DPMLE differs from traditional information criteria by using two penalty functions on the stationary probabilities and state-dependent parameters. For non-stationary HMMs, forward and backward probabilities are used to approximate stationary probabilities. Using a simulation study that includes scenarios with additional complexity in the data, we compare the performance of our method with that of AIC and BIC. We also illustrate how the DPMLE differs from AIC and BIC using narwhal (Monodon monoceros) movement data. The proposed method outperformed AIC and BIC in identifying the correct number of states under model misspecification. Furthermore, its capacity to handle non-stationary dynamics allowed for more realistic modeling of complex movement data, offering deeper insights into narwhal behaviour. Our method is a powerful tool for order selection in non-stationary HMMs, with potential applications extending beyond the field of ecology.
翻译:隐马尔可夫模型(HMMs)是一种多功能的统计框架,在生态学中常用于从动物运动数据中表征行为模式。在HMMs中,观测数据依赖于有限数量的潜在隐状态,这些状态通常被解释为动物未观测到的行为。状态数量是一个关键参数,控制着行为的生态可解释性(状态较少)与模型拟合优度(状态较多)之间的权衡。选择状态数量(通常称为阶数选择)具有众所周知的挑战性。常见的模型选择准则(如AIC和BIC)在确定状态数量时往往表现不佳,尤其是在模型设定错误的情况下。基于HMMs和混合模型的现有方法,我们提出了一种双惩罚似然最大估计(DPMLE),用于同时估计非平稳HMMs的状态数量和参数。DPMLE与传统信息准则的不同之处在于对平稳概率和状态依赖参数使用了两个惩罚函数。对于非平稳HMMs,使用前向和后向概率来近似平稳概率。通过一项包含数据额外复杂性场景的模拟研究,我们比较了所提方法与AIC和BIC的性能。我们还利用独角鲸(Monodon monoceros)的运动数据说明了DPMLE与AIC和BIC的差异。在模型设定错误的情况下,所提方法在识别正确状态数量方面优于AIC和BIC。此外,其处理非平稳动态的能力使得对复杂运动数据的建模更为真实,为理解独角鲸行为提供了更深入的见解。我们的方法是非平稳HMMs阶数选择的有力工具,其潜在应用可扩展到生态学以外的领域。