Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain and the observations are noisy realizations of the underlying process. Determining the number of hidden states for an HMM is a model selection problem, which is yet to be satisfactorily solved, especially for the popular Gaussian HMM with heterogeneous covariance. In this paper, we propose a consistent method for determining the number of hidden states of HMM based on the marginal likelihood, which is obtained by integrating out both the parameters and hidden states. Moreover, we show that the model selection problem of HMM includes the order selection problem of finite mixture models as a special case. We give rigorous proof of the consistency of the proposed marginal likelihood method and provide an efficient computation method for practical implementation. We numerically compare the proposed method with the Bayesian information criterion (BIC), demonstrating the effectiveness of the proposed marginal likelihood method.
翻译:隐马尔可夫模型(HMM)已被科学家广泛用于建模随机系统:底层过程是一个离散马尔可夫链,观测值是该底层过程的含噪声实现。确定HMM的隐状态数量是一个模型选择问题,该问题尚未得到令人满意的解决,尤其是对于具有异质协方差的常用高斯HMM。本文提出一种基于边际似然的确定HMM隐状态数量的一致方法,该边际似然通过对参数和隐状态同时积分得到。此外,我们证明HMM的模型选择问题包含有限混合模型的阶数选择问题作为其特例。我们对所提出的边际似然方法的一致性给出了严格证明,并提供了便于实际实现的高效计算方法。我们通过数值方法将所提方法与贝叶斯信息准则(BIC)进行比较,验证了所提边际似然方法的有效性。