Phylogenetic trees are simple models of evolutionary processes. They describe conditionally independent divergent evolution of taxa from common ancestors. Phylogenetic trees commonly do not have enough flexibility to adequately model all evolutionary processes. For example, introgressive hybridization, where genes can flow from one taxon to another. Phylogenetic networks model evolution not fully described by a phylogenetic tree. However, many phylogenetic network models assume ancestral taxa merge instantaneously to form ``hybrid'' descendant taxa. In contrast, our convergence-divergence models retain a single underlying ``principal'' tree, but permit gene flow over arbitrary time frames. Alternatively, convergence-divergence models can describe other biological processes leading to taxa becoming more similar over a time frame, such as replicated evolution. Here we present novel maximum likelihood-based algorithms to infer most aspects of $N$-taxon convergence-divergence models, many consistently, using a quartet-based approach. The algorithms can be applied to multiple sequence alignments restricted to genes or genomic windows or to gene presence/absence datasets.
翻译:系统发育树是进化过程的简化模型,它们描述了类群从共同祖先出发的条件独立发散进化过程。然而,系统发育树通常缺乏足够的灵活性来充分模拟所有进化过程,例如存在基因可从一类群流向另一类群的渐渗杂交现象。系统发育网络模型能够描述系统发育树无法完全刻画的进化过程,但许多系统发育网络模型假设祖先类群会瞬时融合形成"杂交"后代类群。相比之下,我们的收敛-发散模型保留了单一的底层"主干"树结构,同时允许基因在任意时间范围内流动。此外,该模型也可描述其他导致类群在特定时间段内趋同的生物过程,例如重复进化现象。本文提出了基于最大似然估计的新型算法,通过四分类群方法能够推断$N$类群收敛-发散模型的大部分特征(其中许多具有一致性)。该算法可应用于限于基因或基因组窗口的多序列比对数据,也可用于基因存在/缺失数据集。