Phylogenetic networks are a flexible model of evolution that can represent reticulate evolution and handle complex data. Tree-based networks, which are phylogenetic networks that have a spanning tree with the same root and leaf-set as the network itself, have been well studied. However, not all networks are tree-based. Francis-Semple-Steel (2018) thus introduced several indices to measure the deviation of rooted binary phylogenetic networks $N$ from being tree-based, such as the minimum number $\delta^\ast(N)$ of additional leaves needed to make $N$ tree-based, and the minimum difference $\eta^\ast(N)$ between the number of vertices of $N$ and the number of vertices of a subtree of $N$ that shares the root and leaf set with $N$. Hayamizu (2021) has established a canonical decomposition of almost-binary phylogenetic networks of $N$, called the maximal zig-zag trail decomposition, which has many implications including a linear time algorithm for computing $\delta^\ast(N)$. The Maximum Covering Subtree Problem (MCSP) is the problem of computing $\eta^\ast(N)$, and Davidov et al. (2022) showed that this can be solved in polynomial time (in cubic time when $N$ is binary) by an algorithm for the minimum cost flow problem. In this paper, under the assumption that $N$ is almost-binary (i.e. each internal vertex has in-degree and out-degree at most two), we show that $\delta^\ast(N)\leq \eta^\ast (N)$ holds, which is tight, and give a characterisation of such phylogenetic networks $N$ that satisfy $\delta^\ast(N)=\eta^\ast(N)$. Our approach uses the canonical decomposition of $N$ and focuses on how the maximal W-fences (i.e. the forbidden subgraphs of tree-based networks) are connected to maximal M-fences in the network $N$. Our results introduce a new class of phylogenetic networks for which MCSP can be solved in linear time, which can be seen as a generalisation of tree-based networks.
翻译:系统发育网络是一种灵活的进化模型,能够表示网状进化并处理复杂数据。基于树的网络(即具有与网络本身相同根和叶集的生成树的系统发育网络)已得到充分研究。然而,并非所有网络都是基于树的。Francis-Semple-Steel (2018) 因此引入了若干指标来衡量有根二元系统发育网络 $N$ 偏离基于树特性的程度,例如使 $N$ 成为基于树网络所需的最小附加叶数 $\delta^\ast(N)$,以及 $N$ 的顶点数与 $N$ 中共享根和叶集的子图顶点数之间的最小差值 $\eta^\ast(N)$。Hayamizu (2021) 建立了 $N$ 的近乎二元系统发育网络的一种规范分解,称为最大锯齿路径分解,该分解具有多种应用,包括用于计算 $\delta^\ast(N)$ 的线性时间算法。最大覆盖子树问题 (MCSP) 是计算 $\eta^\ast(N)$ 的问题,Davidov 等人 (2022) 表明,该问题可通过最小费用流问题的算法在多项式时间内(当 $N$ 为二元时在立方时间内)解决。在本文中,我们假设 $N$ 是近乎二元的(即每个内部顶点的入度和出度至多为二),证明 $\delta^\ast(N)\leq \eta^\ast (N)$ 成立,且该界是紧的,并给出了满足 $\delta^\ast(N)=\eta^\ast(N)$ 的系统发育网络 $N$ 的表征。我们的方法利用 $N$ 的规范分解,重点关注最大 W-栅栏(即基于树网络的禁止子图)在 $N$ 中如何连接到最大 M-栅栏。我们的结果引入了一类新的系统发育网络,对于这类网络,MCSP 可在线性时间内解决,这可视为基于树网络的推广。