Phylogenetic networks are graphs inferred from molecular sequence data that represent ancestral histories shaped by reticulate processes such as recombination, hybridization, and horizontal gene transfer. We introduce a family of distance metrics for rooted, ranked, unlabeled phylogenetic networks, extending a previously developed distance for ranked trees. Our approach relies on a bijective triangular matrix representation of phylogenetic networks that captures the temporal order of internal events, speciations, and hybridizations. Our metrics, defined as standard matrix norms, allow efficient quantitative comparisons of network topologies, timed networks and networks with differing numbers of hybridizations. Our distance can be used for both isochronous networks where all tips are sampled at one time point, and heterochronous networks where tips are allowed to be sampled at different time points. We show that our metrics capture biologically meaningful differences among evolutionary histories in both simulations and empirical posterior distributions of viral phylogenetic networks. These tools fill a methodological gap, enabling principled comparisons of ranked, unlabeled phylogenetic networks, including ancestral recombination graphs.
翻译:系统发育网络是从分子序列数据中推断出的图结构,其表示由重组、杂交和基因水平转移等网状过程塑造的祖先演化历史。我们针对有根、有等级且无标签的系统发育网络引入一族距离度量,这是对先前针对有等级树所构建距离的拓展。该方法依赖于系统发育网络的双射三角矩阵表示,能够捕获内部事件、物种形成和杂交的时间顺序。我们的度量定义为标准矩阵范数,可实现对网络拓扑结构、含时间信息的网络以及具有不同杂交次数网络的高效定量比较。该距离既适用于所有末端节点于同一时间点采样的等时网络,也适用于允许末端节点于不同时间点采样的异时网络。我们证明,该度量在模拟数据和病毒系统发育网络的经验后验分布中,均能捕获演化历史中具有生物学意义的差异。这些工具填补了方法论空白,能够对包括祖先重组图在内的有等级、无标签系统发育网络进行规范化比较。