Bayesian phylogenetic inference is currently done via Markov chain Monte Carlo (MCMC) with simple proposal mechanisms. This hinders exploration efficiency and often requires long runs to deliver accurate posterior estimates. In this paper, we present an alternative approach: a variational framework for Bayesian phylogenetic analysis. We propose combining subsplit Bayesian networks, an expressive graphical model for tree topology distributions, and a structured amortization of the branch lengths over tree topologies for a suitable variational family of distributions. We train the variational approximation via stochastic gradient ascent and adopt gradient estimators for continuous and discrete variational parameters separately to deal with the composite latent space of phylogenetic models. We show that our variational approach provides competitive performance to MCMC, while requiring much fewer (though more costly) iterations due to a more efficient exploration mechanism enabled by variational inference. Experiments on a benchmark of challenging real data Bayesian phylogenetic inference problems demonstrate the effectiveness and efficiency of our methods.
翻译:当前贝叶斯系统发育推断主要通过马尔可夫链蒙特卡洛(MCMC)方法配合简单的提议机制实现。这限制了探索效率,通常需要长时间运行才能获得准确的后验估计。本文提出一种替代方案:用于贝叶斯系统发育分析的变分框架。我们提出将子分裂贝叶斯网络(一种用于树拓扑分布的表达性图模型)与树拓扑上分支长度的结构化摊销相结合,构建合适的变分分布族。我们通过随机梯度上升训练变分近似,并分别针对连续和离散变分参数采用梯度估计器,以处理系统发育模型的复合隐空间。研究表明,我们的变分方法在性能上与MCMC相当,同时由于变分推断启用的更高效探索机制,所需迭代次数显著减少(尽管每次迭代成本更高)。在具有挑战性的真实数据贝叶斯系统发育推断基准测试上的实验证明了我们方法的有效性和效率。