Size-constrained Weighted Ancestors with Applications

The weighted ancestor problem on a rooted node-weighted tree $T$ is a generalization of the classic predecessor problem: construct a data structure for a set of integers that supports fast predecessor queries. Both problems are known to require $\Omega(\log\log n)$ time for queries provided $\mathcal{O}(n\text{ poly} \log n)$ space is available, where $n$ is the input size. The weighted ancestor problem has attracted a lot of attention by the combinatorial pattern matching community due to its direct application to suffix trees. In this formulation of the problem, the nodes are weighted by string depth. This attention has culminated in a data structure for weighted ancestors in suffix trees with $\mathcal{O}(1)$ query time and an $\mathcal{O}(n)$-time construction algorithm [Belazzougui et al., CPM 2021]. In this paper, we consider a different version of the weighted ancestor problem, where the nodes are weighted by any function $\textsf{weight}$ that maps the nodes of $T$ to positive integers, such that $\textsf{weight}(u)\le \textsf{size}(u)$ for any node $u$ and $\textsf{weight}(u_1)\le \textsf{weight}(u_2)$ if node $u_1$ is a descendant of node $u_2$, where $\textsf{size}(u)$ is the number of nodes in the subtree rooted at $u$. In the size-constrained weighted ancestor (SWAQ) problem, for any node $u$ of $T$ and any integer $k$, we are asked to return the lowest ancestor $w$ of $u$ with weight at least $k$. We show that for any rooted tree with $n$ nodes, we can locate node $w$ in $\mathcal{O}(1)$ time after $\mathcal{O}(n)$-time preprocessing. In particular, this implies a data structure for the SWAQ problem in suffix trees with $\mathcal{O}(1)$ query time and $\mathcal{O}(n)$-time preprocessing, when the nodes are weighted by $\textsf{weight}$. We also show several string-processing applications of this result.

翻译：加权祖先问题是在带节点加权有根树$T$上对经典前驱问题的推广：构造一个数据结构以支持整数集合上的快速前驱查询。这两个问题在$\mathcal{O}(n\text{ poly} \log n)$空间可用时（$n$为输入规模），已知查询时间下限为$\Omega(\log\log n)$。加权祖先问题因在后缀树中的直接应用而受到组合模式匹配领域的广泛关注。在该问题表述中，节点以字符串深度为权重。这一研究最终催生了后缀树中加权祖先的数据结构，支持$\mathcal{O}(1)$查询时间与$\mathcal{O}(n)$时间构造算法[Belazzougui等人，CPM 2021]。本文考虑加权祖先问题的另一种版本，其中节点权重由任意函数$\textsf{weight}$定义，该函数将$T$的节点映射到正整数，满足对任意节点$u$有$\textsf{weight}(u)\le \textsf{size}(u)$，且若节点$u_1$是$u_2$的后代则$\textsf{weight}(u_1)\le \textsf{weight}(u_2)$（其中$\textsf{size}(u)$为以$u$为根的子树节点数）。在规模约束加权祖先（SWAQ）问题中，对于$T$中任意节点$u$及任意整数$k$，需返回$u$的权重不小于$k$的最低祖先$w$。我们证明，对于任意$n$节点有根树，可在$\mathcal{O}(n)$时间预处理后，以$\mathcal{O}(1)$时间定位节点$w$。当节点以$\textsf{weight}$加权时，这一结果特别意味着后缀树中SWAQ问题存在支持$\mathcal{O}(1)$查询时间与$\mathcal{O}(n)$时间预处理的数据结构。我们还展示了该结果的多项字符串处理应用。