We revisit the classic border tree data structure [Gu, Farach, Beigel, SODA 1994] that answers the following prefix-suffix queries on a string $T$ of length $n$ over an integer alphabet $\Sigma=[0,\sigma)$: for any $i,j \in [0,n)$ return all occurrences of $T$ in $T[0\mathinner{.\,.} i]T[j\mathinner{.\,.} n-1]$. The border tree of $T$ can be constructed in $\mathcal{O}(n)$ time and answers prefix-suffix queries in $\mathcal{O}(\log n + \textsf{Occ})$ time, where $\textsf{Occ}$ is the number of occurrences of $T$ in $T[0\mathinner{.\,.} i]T[j\mathinner{.\,.} n-1]$. Our contribution here is the following. We present a completely different and remarkably simple data structure that can be constructed in the optimal $\mathcal{O}(n/\log_\sigma n)$ time and supports queries in the optimal $\mathcal{O}(1)$ time. Our result is based on a new structural lemma that lets us encode the output of any query in constant time and space. We also show a new direct application of our result in pattern matching on node-labeled graphs.
翻译:我们重新审视经典的边界树数据结构[Gu, Farach, Beigel, SODA 1994],该结构用于回答在整数字母表$\Sigma=[0,\sigma)$上长度为$n$的字符串$T$的以下前缀-后缀查询:对于任意$i,j \in [0,n)$,返回$T$在$T[0\mathinner{.\,.} i]T[j\mathinner{.\,.} n-1]$中的所有出现位置。$T$的边界树可在$\mathcal{O}(n)$时间内构建,并以$\mathcal{O}(\log n + \textsf{Occ})$时间回答前缀-后缀查询,其中$\textsf{Occ}$是$T$在$T[0\mathinner{.\,.} i]T[j\mathinner{.\,.} n-1]$中的出现次数。我们的贡献如下:我们提出了一种完全不同的、极其简单的数据结构,可在最优的$\mathcal{O}(n/\log_\sigma n)$时间内构建,并支持以最优的$\mathcal{O}(1)$时间进行查询。我们的结果基于一个新的结构引理,该引理允许我们在常数时间和空间内编码任何查询的输出。我们还展示了该结果在节点标记图上的模式匹配中的新直接应用。