Horizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from a particular genomic signature, while phylogenetic methods use sequences to reconstruct the gene and species trees. However, it is well-known that sequences have difficulty identifying ancient transfers since mutations have enough time to erase all evidence of such events. In this work, we ask whether character-based methods can predict gene transfers. Their advantage over sequences is that homologous genes can have low DNA similarity, but still have retained enough important common motifs that allow them to have common character traits, for instance the same functional or expression profile. A phylogeny that has two separate clades that acquired the same character independently might indicate the presence of a transfer even in the absence of sequence similarity. We introduce perfect transfer networks, which are phylogenetic networks that can explain the character diversity of a set of taxa under the assumption that characters have unique births, and that once a character is gained it is rarely lost. Examples of such traits include transposable elements, biochemical markers and emergence of organelles, just to name a few. We study the differences between our model and two similar models: perfect phylogenetic networks and ancestral recombination networks. Our goals are to initiate a study on the structural and algorithmic properties of perfect transfer networks. We then show that in polynomial time, one can decide whether a given network is a valid explanation for a set of taxa, and show how, for a given tree, one can add transfer edges to it so that it explains a set of taxa. We finally provide lower and upper bounds on the number of transfers required to explain a set of taxa, in the worst case.
翻译:水平基因转移推断方法通常基于基因序列:参数方法搜索偏离特定基因组特征的模式,而系统发育方法则利用序列重建基因树和物种树。然而众所周知,序列方法难以识别古老的转移事件,因为突变有足够时间抹除此类事件的所有证据。本研究探索基于性状的方法能否预测基因转移。与序列相比,其优势在于同源基因虽可能具有较低的DNA相似性,但仍保留足够数量的重要共同基序,从而保持共同的性状特征(例如相同的功能或表达谱)。当系统发育树中两个独立分支独立获得相同性状时,即使在缺乏序列相似性的情况下,也可能提示存在转移事件。我们提出完美转移网络概念,这是一种系统发育网络,能够基于"性状单次起源且获得后极少丢失"的假设,解释一组分类群的性状多样性。此类性状示例包括转座元件、生化标记以及细胞器形成等。我们研究了本模型与两个相似模型(完美系统发育网络和祖先重组网络)之间的差异,旨在开启对完美转移网络结构与算法特性的研究。继而证明可在多项式时间内判定给定网络是否为分类群集的合理解释,并展示如何为给定树添加转移边使其能解释分类群集。最后,我们给出在最坏情况下解释分类群所需转移次数的上下界。