Hidden Markov Models (HMMs) are powerful tools for modeling sequential data, where the underlying states evolve in a stochastic manner and are only indirectly observable. Traditional HMM approaches are well-established for linear sequences, and have been extended to other structures such as trees. In this paper, we extend the framework of HMMs on trees to address scenarios where the tree-like structure of the data includes coupled branches -- a common feature in biological systems where entities within the same lineage exhibit dependent characteristics. We develop a dynamic programming algorithm that efficiently solves the likelihood, decoding, and parameter learning problems for tree-based HMMs with coupled branches. Our approach scales polynomially with the number of states and nodes, making it computationally feasible for a wide range of applications and does not suffer from the underflow problem. We demonstrate our algorithm by applying it to simulated data and propose self-consistency checks for validating the assumptions of the model used for inference. This work not only advances the theoretical understanding of HMMs on trees but also provides a practical tool for analyzing complex biological data where dependencies between branches cannot be ignored.
翻译:隐马尔可夫模型(HMMs)是建模序列数据的强大工具,其中潜在状态以随机方式演化且仅能被间接观测。传统的HMM方法已在线性序列上得到广泛应用,并已扩展至树等其他结构。本文扩展了树上HMM的框架,以处理数据树状结构包含耦合分支的场景——这在生物系统中十分常见,其中同一谱系内的实体表现出依赖特征。我们开发了一种动态规划算法,可高效求解具有耦合分支的树上HMM的似然计算、解码和参数学习问题。该方法的时间复杂度随状态数和节点数呈多项式增长,使其在广泛应用中具有计算可行性,且不会出现下溢问题。我们通过模拟数据验证了算法性能,并提出了自洽性检验方法以验证推理所用模型的假设。本工作不仅推进了对树上HMM的理论理解,还为分析分支间依赖关系不可忽略的复杂生物数据提供了实用工具。