Characterizing and Transforming DAGs within the I-LCA Framework

We explore the connections between clusters and least common ancestors (LCAs) in directed acyclic graphs (DAGs), focusing on DAGs with unique LCAs for specific subsets of their leaves. These DAGs are important in modeling phylogenetic networks that account for reticulate processes or horizontal gene transfer. Phylogenetic DAGs inferred from genomic data are often complex, obscuring evolutionary insights, especially when vertices lack support as LCAs for any subset of taxa. To address this, we focus on $I$-lca-relevant DAGs, where each vertex serves as the unique LCA for a subset $A$ of leaves of specific size $|A|\in I$. We characterize DAGs with the so-called $I$-lca-property and establish their close relationship to pre-$I$-ary and $I$-ary set systems. Moreover, we build upon recently established results that use a simple operator $\ominus$, enabling the transformation of arbitrary DAGs into $I$-lca-relevant DAGs. This process reduces unnecessary complexity while preserving the key structural properties of the original DAG. The set $C_G$ consists of all clusters in a DAG $G$, where clusters correspond to the descendant leaves of vertices. While in some cases $C_H = C_G$ when transforming $G$ into an $I$-lca-relevant DAG $H$, it often happens that certain clusters in $C_G$ do not appear as clusters in $H$. To understand this phenomenon in detail, we characterize the subset of clusters in $C_G$ that remain in $H$ for DAGs $G$ with the $I$-lca-property. Furthermore, we show that the set $W$ of vertices required to transform $G$ into $H = G \ominus W$ is uniquely determined for such DAGs. This, in turn, allows us to show that the transformed DAG $H$ is always a tree or a galled-tree whenever $C_G$ represents the clustering system of a tree or galled-tree and $G$ has the $I$-lca-property. In the latter case $C_H = C_G$ always holds.

翻译：我们探讨有向无环图（DAG）中聚类与最近公共祖先（LCA）之间的联系，重点关注那些特定叶节点子集具有唯一LCA的DAG。这类DAG在建模包含网状演化过程或水平基因转移的系统发育网络中具有重要意义。从基因组数据推断出的系统发育DAG通常结构复杂，会掩盖进化关系的清晰呈现，尤其当某些顶点无法作为任何分类单元子集的LCA时。为解决此问题，我们聚焦于$I$-lca相关DAG——其中每个顶点都是特定规模$|A|\in I$的叶节点子集$A$的唯一LCA。我们刻画了具有$I$-lca性质的DAG，并建立了它们与预$I$元及$I$元集合系的紧密联系。此外，基于近期利用简单算子$\ominus$的研究成果，我们实现了将任意DAG转换为$I$-lca相关DAG的方法。该转换过程能在保持原DAG关键结构性质的同时，消除不必要的复杂性。集合$C_G$包含DAG $G$中所有聚类（即顶点后代叶节点构成的集合）。虽然将$G$转换为$I$-lca相关DAG $H$时，有时会出现$C_H = C_G$的情况，但$C_G$中的某些聚类往往不会出现在$H$中。为深入理解该现象，我们刻画了具有$I$-lca性质的DAG $G$中那些在$H$中得以保留的聚类子集。进一步证明，对于此类DAG，将$G$转换为$H = G \ominus W$所需的顶点集合$W$是唯一确定的。由此可证：当$C_G$表示树或网状树的聚类系统，且$G$具有$I$-lca性质时，转换得到的DAG $H$总是树或网状树——在后一种情形下，$C_H = C_G$恒成立。