Given a set $X$ of species, a phylogenetic tree is an unrooted binary tree whose leaves are bijectively labelled by $X$. Such trees can be used to show the way species evolve over time. One way of understanding how topologically different two phylogenetic trees are, is to construct a minimum-size agreement forest: a partition of $X$ into the smallest number of blocks, such that the blocks induce homeomorphic, non-overlapping subtrees in both trees. This comparison yields insight into commonalities and differences in the evolution of $X$ across the two trees. Computing a smallest agreement forest is NP-hard (Hein, Jiang, Wang and Zhang, Discrete Applied Mathematics 71(1-3), 1996). In this work we study the problem on caterpillars, which are path-like phylogenetic trees. We will demonstrate that, even if we restrict the input to this highly restricted subclass, the problem remains NP-hard and is in fact APX-hard. Furthermore we show that for caterpillars two standard reductions rules well known in the literature yield a tight kernel of size at most $7k$, compared to $15k$ for general trees (Kelk and Simone, SIAM Journal on Discrete Mathematics 33(3), 2019). Finally we demonstrate that we can determine if two caterpillars have an agreement forest with at most $k$ blocks in $O^*(2.49^k)$ time, compared to $O^*(3^k)$ for general trees (Chen, Fan and Sze, Theoretical Computater Science 562, 2015), where $O^*(.)$ suppresses polynomial factors.
翻译:给定一个物种集合$X$,一棵系统发育树是一个无根二叉树,其叶子由$X$双射标记。此类树可用于展示物种随时间的演化方式。理解两棵系统发育树在拓扑结构上差异程度的一种方法是构造一个最小规模的共识森林:将$X$划分为最小数量的区块,使得这些区块在两棵树中诱导出同胚且不重叠的子树。这种比较揭示了$X$在两棵树演化过程中的共性与差异。计算最小共识森林是NP难的(Hein, Jiang, Wang and Zhang, Discrete Applied Mathematics 71(1-3), 1996)。本文研究毛毛虫树(一类路径状系统发育树)上的该问题。我们将证明:即使将输入限制在这一高度受限的子类中,问题仍然保持NP难性,并且实际上是APX难的。此外,我们表明对于毛毛虫树,文献中已知的两个标准归约规则可生成大小为$7k$的紧核,而一般树的核大小为$15k$(Kelk and Simone, SIAM Journal on Discrete Mathematics 33(3), 2019)。最后,我们证明可以$O^*(2.49^k)$时间判定两棵毛毛虫树是否具有不超过$k$个区块的共识森林,而一般树则需要$O^*(3^k)$时间(Chen, Fan and Sze, Theoretical Computer Science 562, 2015),其中$O^*(.)$表示忽略多项式因子。