Inferring a state sequence from a sequence of measurements is a fundamental problem in bioinformatics and natural language processing. The Viterbi and the Beam Search (BS) algorithms are popular inference methods, but they have limitations when applied to Hierarchical Hidden Markov Models (HHMMs), where the interest lies in the outer state sequence. The Viterbi algorithm can not infer outer states without inner states, while the BS algorithm requires marginalization over prohibitively large state spaces. We propose two new algorithms to overcome these limitations: the greedy marginalized BS algorithm and the local focus BS algorithm. We show that they approximate the most likely outer state sequence with higher performance than the Viterbi algorithm, and we evaluate the performance of these algorithms on an explicit duration HMM with simulation and nanopore base calling data.
翻译:从测量序列中推断状态序列是生物信息学和自然语言处理中的基本问题。Viterbi算法和束搜索算法是常用的推断方法,但在应用于分层隐马尔可夫模型时存在局限性——当关注外层状态序列时,Viterbi算法无法在不涉及内层状态的情况下推断外层状态,而束搜索算法需要对规模过大的状态空间进行边缘化处理。我们提出两种新算法来克服这些局限:贪心边缘化束搜索算法和局部聚焦束搜索算法。我们证明这两种算法能够以高于Viterbi算法的性能近似最优外层状态序列,并通过仿真实验和纳米孔碱基识别数据在显式持续时间隐马尔可夫模型上评估了这些算法的性能。