Improved order selection method for hidden Markov models: a case study with movement data

Hidden Markov models (HMMs) are a versatile statistical framework commonly used in ecology to characterize behavioural patterns from animal movement data. In HMMs, the observed data depend on a finite number of underlying hidden states, generally interpreted as the animal's unobserved behaviour. The number of states is a crucial parameter, controlling the trade-off between ecological interpretability of behaviours (fewer states) and the goodness of fit of the model (more states). Selecting the number of states, commonly referred to as order selection, is notoriously challenging. Common model selection metrics, such as AIC and BIC, often perform poorly in determining the number of states, particularly when models are misspecified. Building on existing methods for HMMs and mixture models, we propose a double penalized likelihood maximum estimate (DPMLE) for the simultaneous estimation of the number of states and parameters of non-stationary HMMs. The DPMLE differs from traditional information criteria by using two penalty functions on the stationary probabilities and state-dependent parameters. For non-stationary HMMs, forward and backward probabilities are used to approximate stationary probabilities. Using a simulation study that includes scenarios with additional complexity in the data, we compare the performance of our method with that of AIC and BIC. We also illustrate how the DPMLE differs from AIC and BIC using narwhal (Monodon monoceros) movement data. The proposed method outperformed AIC and BIC in identifying the correct number of states under model misspecification. Furthermore, its capacity to handle non-stationary dynamics allowed for more realistic modeling of complex movement data, offering deeper insights into narwhal behaviour. Our method is a powerful tool for order selection in non-stationary HMMs, with potential applications extending beyond the field of ecology.

翻译：隐马尔可夫模型（HMMs）是一种多功能的统计框架，在生态学中常用于从动物运动数据中表征行为模式。在HMMs中，观测数据依赖于有限数量的潜在隐状态，这些状态通常被解释为动物未观测到的行为。状态数量是一个关键参数，控制着行为的生态可解释性（状态较少）与模型拟合优度（状态较多）之间的权衡。选择状态数量（通常称为阶数选择）具有众所周知的挑战性。常见的模型选择准则（如AIC和BIC）在确定状态数量时往往表现不佳，尤其是在模型设定错误的情况下。基于HMMs和混合模型的现有方法，我们提出了一种双惩罚似然最大估计（DPMLE），用于同时估计非平稳HMMs的状态数量和参数。DPMLE与传统信息准则的不同之处在于对平稳概率和状态依赖参数使用了两个惩罚函数。对于非平稳HMMs，使用前向和后向概率来近似平稳概率。通过一项包含数据额外复杂性场景的模拟研究，我们比较了所提方法与AIC和BIC的性能。我们还利用独角鲸（Monodon monoceros）的运动数据说明了DPMLE与AIC和BIC的差异。在模型设定错误的情况下，所提方法在识别正确状态数量方面优于AIC和BIC。此外，其处理非平稳动态的能力使得对复杂运动数据的建模更为真实，为理解独角鲸行为提供了更深入的见解。我们的方法是非平稳HMMs阶数选择的有力工具，其潜在应用可扩展到生态学以外的领域。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日