Given a set $X$ of species, a phylogenetic tree is an unrooted binary tree whose leaves are bijectively labelled by $X$. Such trees can be used to show the way species evolve over time. One way of understanding how topologically different two phylogenetic trees are, is to construct a minimum-size agreement forest: a partition of $X$ into the smallest number of blocks, such that the blocks induce homeomorphic, non-overlapping subtrees in both trees. This comparison yields insight into commonalities and differences in the evolution of $X$ across the two trees. Computing a smallest agreement forest is NP-hard (Hein, Jiang, Wang and Zhang, Discrete Applied Mathematics 71(1-3), 1996). In this work we study the problem on caterpillars, which are path-like phylogenetic trees. We will demonstrate that, even if we restrict the input to this highly restricted subclass, the problem remains NP-hard and is in fact APX-hard. Furthermore we show that for caterpillars two standard reductions rules well known in the literature yield a tight kernel of size at most $7k$, compared to $15k$ for general trees (Kelk and Simone, SIAM Journal on Discrete Mathematics 33(3), 2019). Finally we demonstrate that we can determine if two caterpillars have an agreement forest with at most $k$ blocks in $O^*(2.49^k)$ time, compared to $O^*(3^k)$ for general trees (Chen, Fan and Sze, Theoretical Computater Science 562, 2015), where $O^*(.)$ suppresses polynomial factors.
翻译:给定一个物种集合$X$,系统发育树是一棵无根二叉树,其叶子与$X$一一对应标记。此类树可用于展示物种随时间演化的方式。理解两棵系统发育树在拓扑结构上差异程度的一种方法是构造一个最小规模共识森林:将$X$划分为尽可能少的区块,使得每个区块在两棵树中均导出同胚且互不重叠的子树。这种比较能揭示$X$在两棵树演化过程中的共性与差异。计算最小共识森林是NP难问题(Hein, Jiang, Wang and Zhang, Discrete Applied Mathematics 71(1-3), 1996)。本文研究毛毛虫树(一种路径状系统发育树)上的该问题。我们将证明,即使将输入限制为这种高度特化的子类,该问题仍为NP难,且实际上是APX难的。此外,我们发现对于毛毛虫树,文献中熟知的两种标准归约规则可生成规模不超过$7k$的紧致核,而普通树的核规模为$15k$(Kelk and Simone, SIAM Journal on Discrete Mathematics 33(3), 2019)。最后,我们证明可以在$O^*(2.49^k)$时间内判定两棵毛毛虫树是否存在至多$k$个区块的共识森林,而普通树的时间复杂度为$O^*(3^k)$(Chen, Fan and Sze, Theoretical Computer Science 562, 2015),其中$O^*(.)$略去多项式因子。