Designing flexible probabilistic models over tree topologies is important for developing efficient phylogenetic inference methods. To do that, previous works often leverage the similarity of tree topologies via hand-engineered heuristic features which would require pre-sampled tree topologies and may suffer from limited approximation capability. In this paper, we propose a deep autoregressive model for phylogenetic inference based on graph neural networks (GNNs), called ARTree. By decomposing a tree topology into a sequence of leaf node addition operations and modeling the involved conditional distributions based on learnable topological features via GNNs, ARTree can provide a rich family of distributions over the entire tree topology space that have simple sampling algorithms and density estimation procedures, without using heuristic features. We demonstrate the effectiveness and efficiency of our method on a benchmark of challenging real data tree topology density estimation and variational Bayesian phylogenetic inference problems.
翻译:设计灵活的树拓扑概率模型对于开发高效的系统发育推断方法至关重要。为此,以往研究常利用手工设计的启发式特征来捕捉树拓扑的相似性,但这类方法需要预采样的树拓扑,且可能面临近似能力有限的局限性。本文提出了一种基于图神经网络(GNN)的深度自回归系统发育推断模型,称为ARTree。通过将树拓扑分解为一系列叶节点添加操作,并利用GNN基于可学习的拓扑特征对涉及的条件分布进行建模,ARTree能够在无需启发式特征的情况下,为整个树拓扑空间提供具有简单采样算法和密度估计流程的丰富分布族。我们在包含挑战性真实数据树拓扑密度估计和变分贝叶斯系统发育推断问题的基准测试中,验证了该方法的有效性和高效性。