Rooted phylogenetic networks, or more generally, directed acyclic graphs (DAGs), are widely used to model species or gene relationships that traditional rooted trees cannot fully capture, especially in the presence of reticulate processes or horizontal gene transfers. Such networks or DAGs are typically inferred from genomic data of extant taxa, providing only an estimate of the true evolutionary history. However, these inferred DAGs are often complex and difficult to interpret. In particular, many contain vertices that do not serve as least common ancestors (LCAs) for any subset of the underlying genes or species, thus lacking direct support from the observed data. In contrast, LCA vertices represent ancestral states substantiated by the data, offering important insights into evolutionary relationships among subsets of taxa. To reduce unnecessary complexity and eliminate unsupported vertices, we aim to simplify a DAG to retain only LCA vertices while preserving essential evolutionary information. In this paper, we characterize $\mathrm{LCA}$-relevant and $\mathrm{lca}$-relevant DAGs, defined as those in which every vertex serves as an LCA (or unique LCA) for some subset of taxa. We introduce methods to identify LCAs in DAGs and efficiently transform any DAG into an $\mathrm{LCA}$-relevant or $\mathrm{lca}$-relevant one while preserving key structural properties of the original DAG or network. This transformation is achieved using a simple operator ``$\ominus$'' that mimics vertex suppression.
翻译:有根系统发育网络,或更广义的有向无环图(DAGs),被广泛用于建模传统有根树无法完全捕获的物种或基因关系,尤其是在存在网状进化过程或水平基因转移的情况下。此类网络或DAGs通常从现存类群的基因组数据中推断得出,仅提供了真实进化历史的估计。然而,这些推断出的DAGs往往复杂且难以解释。具体而言,许多DAGs包含的顶点并不作为任何底层基因或物种子集的最小公共祖先(LCA),因此缺乏观测数据的直接支持。相比之下,LCA顶点代表了数据所证实的祖先状态,为类群子集间的进化关系提供了重要见解。为了减少不必要的复杂性并消除无支持的顶点,我们的目标是简化DAG,仅保留LCA顶点,同时保留必要的进化信息。在本文中,我们刻画了$\mathrm{LCA}$-相关与$\mathrm{lca}$-相关的DAGs,其定义为每个顶点都作为某个类群子集的LCA(或唯一LCA)的DAGs。我们介绍了识别DAGs中LCA的方法,并能够高效地将任何DAG转换为$\mathrm{LCA}$-相关或$\mathrm{lca}$-相关的DAG,同时保持原始DAG或网络的关键结构特性。这一转换通过使用一个模拟顶点抑制的简单算子“$\ominus$”来实现。