Bayesian phylogenetics is vital for understanding evolutionary dynamics, and requires accurate and efficient approximation of posterior distributions over trees. In this work, we develop a variational Bayesian approach for ultrametric phylogenetic trees. We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form density for the resulting distribution over trees. Unlike existing methods for ultrametric trees, our method performs inference over all of tree space, it does not require any Markov chain Monte Carlo subroutines, and our variational family is differentiable. Through experiments on benchmark genomic datasets and an application to the viral RNA of SARS-CoV-2, we demonstrate that our method achieves competitive accuracy while requiring significantly fewer gradient evaluations than existing state-of-the-art techniques.
翻译:贝叶斯系统发育学对于理解进化动力学至关重要,其需要准确且高效地近似树上的后验分布。在本工作中,我们针对超度量系统发育树开发了一种变分贝叶斯方法。我们提出了一种基于单连接聚类合并时间的新型变分族,并推导了所得树分布的一个闭式密度。与现有的超度量树方法不同,我们的方法在整个树空间上进行推断,它不需要任何马尔可夫链蒙特卡洛子程序,并且我们的变分族是可微的。通过在基准基因组数据集上的实验以及对SARS-CoV-2病毒RNA的应用,我们证明,与现有最先进技术相比,我们的方法在达到竞争性精度的同时,所需的梯度评估次数显著减少。