Recently, a new vector encoding, Ordered Leaf Attachment (OLA), was introduced that represents $n$-leaf phylogenetic trees as $n-1$ length integer vectors by recording the placement location of each leaf. Both encoding and decoding of trees run in linear time and depend on a fixed ordering of the leaves. Here, we investigate the connection between OLA vectors and the maximum acyclic agreement forest (MAAF) problem. A MAAF represents an optimal breakdown of $k$ trees into reticulation-free subtrees, with the roots of these subtrees representing reticulation events. We introduce a corrected OLA distance index over OLA vectors of $k$ trees, which is easily computable in linear time. We prove that the corrected OLA distance corresponds to the size of a MAAF, given an optimal leaf ordering that minimizes that distance. Additionally, a MAAF can be easily reconstructed from optimal OLA vectors. We expand these results to multifurcated trees: we introduce an $O(kn \cdot m\log m)$ algorithm that optimally resolves a set of multifurcated trees given a leaf-ordering, where $m$ is the size of a largest multifurcation, and show that trees resolved via this algorithm also minimize the size of a MAAF. These results suggest a new approach to fast computation of phylogenetic networks and identification of reticulation events via random permutations of leaves. Additionally, in the case of microbial evolution, a natural ordering of leaves is often given by the sample collection date, which means that under mild assumptions, reticulation events can be identified in polynomial time on such datasets.
翻译:最近,一种新的向量编码方法——有序叶附着(OLA)被提出,该方法通过记录每个叶片的附着位置,将具有 $n$ 个叶片的系统发育树表示为长度为 $n-1$ 的整数向量。树的编码与解码均在线性时间内完成,且依赖于叶片的固定排序。本文研究 OLA 向量与最大无环一致森林(MAAF)问题之间的联系。MAAF 表示将 $k$ 棵树最优分解为无网状结构的子树,这些子树的根即代表网状进化事件。我们基于 $k$ 棵树的 OLA 向量引入了一种修正的 OLA 距离指数,该指数可在线性时间内轻松计算。我们证明,在最小化该距离的最优叶片排序下,修正的 OLA 距离对应于 MAAF 的规模。此外,可从最优 OLA 向量轻松重构 MAAF。我们将这些结果推广至多分叉树:针对给定的叶片排序,我们提出一种 $O(kn \cdot m\log m)$ 算法以最优方式解析一组多分叉树,其中 $m$ 为最大多分叉的规模,并证明通过该算法解析的树同样最小化了 MAAF 的规模。这些结果表明了一种通过叶片随机排列快速计算系统发育网络和识别网状进化事件的新途径。此外,在微生物进化研究中,叶片常按样本采集日期自然排序,这意味着在温和假设下,此类数据集中的网状进化事件可在多项式时间内被识别。