Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem, which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covariance. In this paper, we propose a consistent method for determining the number of hidden states of HMM based on the marginal likelihood, which is obtained by integrating out both the parameters and hidden states. Moreover, we show that the model selection problem of HMM includes the order selection problem of finite mixture models as a special case. We give rigorous proof of the consistency of the proposed marginal likelihood method and provide an efficient computation method for practical implementation. We numerically compare the proposed method with the Bayesian information criterion (BIC), demonstrating the effectiveness of the proposed marginal likelihood method.
翻译:隐马尔可夫模型(HMM)已被科研工作者广泛用于建模随机系统:其底层过程是一个离散马尔可夫链,而观测值则是该底层过程的含噪声实现。确定HMM的隐状态数量是一个模型选择问题,目前尚未得到令人满意的解决,尤其对于具有异质协方差的常用高斯HMM而言。本文提出一种基于边际似然的一致性方法来确定HMM的隐状态数量,该边际似然通过对参数和隐状态同时积分获得。此外,我们证明HMM的模型选择问题包含有限混合模型的阶数选择问题作为特例。我们严格证明了所提边际似然方法的一致性,并提供了高效的计算方法以供实际应用。我们通过数值实验将所提方法与贝叶斯信息准则(BIC)进行比较,验证了所提边际似然方法的有效性。