There are few, if any, algorithms in statistical phylogenetics which are used more heavily than Felsenstein's 1973 pruning method for computing the likelihood of a tree. We present LvD, (Likelihood via Decomposition), an alternative to Felsenstein's algorithm based on a different decomposition of the underlying phylogeny. It works for all standard nucleotide models. The new algorithm allows updates of the likelihood calculation in worst case $O(\log n)$ time with $n$ taxa, as opposed to worst case $O(n)$ time for existing methods. In practice this leads to appreciable improvements in likelihood calculations, the extent of speed-up depending on how balanced or unbalanced the trees are. We explore implications for parallel computing, and show that the approach allows likelihoods to be computed in $O(\log n)$ parallel time per site, compared to (worst case) $O(n)$ time. We implemented and applied the algorithm to large numbers of simulated and empirical data sets and showed that these theoretical advances lead to a significant practical speed-up, although the extent of the improvement depends on how balanced the phylogenies already are.
翻译:在统计系统发育学中,几乎没有比Felsenstein于1973年提出的用于计算树似然性的剪枝算法使用更广泛的算法。我们提出了LvD(通过分解计算似然性),这是一种基于对底层系统发育树进行不同分解的、替代Felsenstein算法的新方法。它适用于所有标准核苷酸模型。新算法在具有n个分类群的情况下,能够在最坏情况下以$O(\log n)$的时间复杂度更新似然性计算,而现有方法在最坏情况下需要$O(n)$时间。在实践中,这带来了似然性计算的显著改进,加速的程度取决于树的平衡或不平衡程度。我们探讨了其对并行计算的影响,并表明该方法允许每个位点的似然性计算在$O(\log n)$的并行时间内完成,相比之下(最坏情况下)现有方法需要$O(n)$时间。我们实现并将该算法应用于大量模拟和实证数据集,结果表明这些理论进展带来了显著的实践加速,尽管改进的程度取决于系统发育树原有的平衡性。