Motivation: The abundance of gene flow in the Tree of Life challenges the notion that evolution can be represented with a fully bifurcating process, as this process cannot capture important biological realities like hybridization, introgression, or horizontal gene transfer. Coalescent-based network methods are increasingly popular, yet not scalable for big data, because they need to perform a heuristic search in the space of networks as well as numerical optimization that can be NP-hard. Results: Here, we introduce a novel method to reconstruct phylogenetic networks based on algebraic invariants. While there is a long tradition of using algebraic invariants in phylogenetics, our work is the first to define phylogenetic invariants on concordance factors (frequencies of 4-taxon splits in the input gene trees) to identify level-1 phylogenetic networks under the multispecies coalescent model. Our novel inference methodology is optimization-free as it only requires the evaluation of polynomial equations, and as such, it bypasses the traversal of network space, yielding a computational speed at least 10 times faster than the fastest-to-date network methods. We illustrate the accuracy and speed of our new method on a variety of simulated scenarios as well as in the estimation of a phylogenetic network for the genus Canis. Availability and Implementation: We implement our novel theory on an open-source publicly available Julia package PhyloDiamond.jl available at https://github.com/solislemuslab/PhyloDiamond.jl with broad applicability within the evolutionary biology community. Contact: [email protected]
翻译:动机:生命之树中丰富的基因流现象挑战了进化完全可用二分叉过程表征的传统观点,因为该过程无法捕捉杂交、渐渗或水平基因转移等重要生物学现实。基于溯祖的网络方法日益普及,但在大数据场景下缺乏可扩展性,这源于其需要在网络空间进行启发式搜索及可能涉及NP-hard问题的数值优化。结果:本文提出一种基于代数不变量重建系统发育网络的新方法。尽管代数不变量在系统发育学中已有悠久的应用传统,本研究首次在一致性因子(输入基因树中4分类群分裂频率)上定义系统发育不变量,用于识别多位点溯祖模型下的1级系统发育网络。我们创新的推理方法无需优化,仅需评估多项式方程,从而绕过了网络空间遍历过程,计算速度较现有最快网络方法提升至少10倍。通过在多种模拟场景及犬属系统发育网络估计中的验证,我们展示了该方法的准确性与速度。可用性与实现:已将新理论实现为开源Julia软件包PhyloDiamond.jl(https://github.com/solislemuslab/PhyloDiamond.jl),可在进化生物学领域广泛适用。联系方式:[email protected]