A least common ancestor (LCA) of two leaves in a directed acyclic graph (DAG) is a vertex that is an ancestor of both leaves and has no proper descendant that is also their common ancestor. LCAs capture hierarchical relationships in rooted trees and, more generally, in DAGs. In 1981, Aho et al. introduced the problem of determining whether a set of pairwise LCA constraints on a set $X$, of the form $(i,j)<(k,l)$ with $i,j,k,l\in X$, can be realized by a rooted tree whose leaf set is $X$, such that whenever $(i,j)<(k,l)$, the LCA of $i,j$ is a descendant of that of $k,l$. They also presented a polynomial-time algorithm, BUILD, to solve this problem. However, many such constraint systems cannot be realized by any tree, prompting the question of whether they can be realized by a more general DAG. We extend Aho et al.'s framework from trees to DAGs, providing both theoretical and algorithmic foundations for reasoning about LCA constraints in this broader setting. Given a collection $R$ of LCA constraints, we define its $+$-closure $R^+$, capturing additional LCA relations implied by $R$. Using $R^+$, we construct a canonical DAG $G_R$ and prove that $R$ is DAG-realizable if and only if it is realized by $G_R$. We further adapt this construction to phylogenetic networks, defining a canonical network $N_R$ and prove that it is regular, i.e., it coincides with the Hasse diagram of its underlying set system. Finally, we show that for any DAG-realizable $R$, its classical closure - comprising all LCA constraints that hold in every DAG realizing $R$ - coincides with its $+$-closure. All constructions are computable in polynomial time, and we provide explicit algorithms for each. All algorithms developed in this paper are implemented in the freely available Python package RealLCA.
翻译:有向无环图(DAG)中两片叶子的最低共同祖先(LCA)是指同时作为这两片叶子的祖先,且没有真后代也同为它们共同祖先的顶点。LCA捕捉了有根树乃至更一般DAG中的层次关系。1981年,Aho等人提出了一个问题:给定集合$X$上一组成对LCA约束(形如$(i,j)<(k,l)$,其中$i,j,k,l\in X$),是否能通过一棵以$X$为叶子集合的有根树实现,使得每当$(i,j)<(k,l)$成立时,$i,j$的LCA是$k,l$的LCA的后代?他们还提出了一个多项式时间算法BUILD来解决该问题。然而,许多此类约束系统无法由任何树实现,这引出了一个问题:它们是否可以被更一般的DAG实现?本文将Aho等人的框架从树扩展到DAG,为在这种更广泛的设定下推理LCA约束提供了理论基础和算法基础。给定一个LCA约束集合$R$,我们定义其`+`闭包$R^+$,它捕捉了由$R$隐含的其他LCA关系。利用$R^+$,我们构造了一个规范DAG $G_R$,并证明$R$是DAG可实现的当且仅当它可由$G_R$实现。我们进一步将这一构造适应到系统发育网络,定义了规范网络$N_R$,并证明它是正则的,即它与底层集合系统的哈斯图一致。最后,我们证明:对于任何DAG可实现的$R$,其经典闭包(即所有实现$R$的DAG中均成立的LCA约束的集合)与其`+`闭包一致。所有构造均可在多项式时间内计算,我们为每个构造提供了明确的算法。本文开发的所有算法均已集成在免费提供的Python软件包RealLCA中。