Construction of Sparse Suffix Trees and LCE Indexes in Optimal Time and Space

The notions of synchronizing and partitioning sets are recently introduced variants of locally consistent parsings with great potential in problem-solving. In this paper we propose a deterministic algorithm that constructs for a given readonly string of length $n$ over the alphabet $\{0,1,\ldots,n^{\mathcal{O}(1)}\}$ a variant of $\tau$-partitioning set with size $\mathcal{O}(b)$ and $\tau = \frac{n}{b}$ using $\mathcal{O}(b)$ space and $\mathcal{O}(\frac{1}{\epsilon}n)$ time provided $b \ge n^\epsilon$, for $\epsilon > 0$. As a corollary, for $b \ge n^\epsilon$ and constant $\epsilon > 0$, we obtain linear construction algorithms with $\mathcal{O}(b)$ space on top of the string for two major small-space indexes: a sparse suffix tree, which is a compacted trie built on $b$ chosen suffixes of the string, and a longest common extension (LCE) index, which occupies $\mathcal{O}(b)$ space and allows us to compute the longest common prefix for any pair of substrings in $\mathcal{O}(n/b)$ time. For both, the $\mathcal{O}(b)$ construction storage is asymptotically optimal since the tree itself takes $\mathcal{O}(b)$ space and any LCE index with $\mathcal{O}(n/b)$ query time must occupy at least $\mathcal{O}(b)$ space by a known trade-off (at least for $b \ge \Omega(n / \log n)$). In case of arbitrary $b \ge \Omega(\log^2 n)$, we present construction algorithms for the partitioning set, sparse suffix tree, and LCE index with $\mathcal{O}(n\log_b n)$ running time and $\mathcal{O}(b)$ space, thus also improving the state of the art.

翻译：同步集与划分集是近期提出的局部一致解析变体，在问题求解中展现出巨大潜力。本文提出一种确定性算法，针对长度为$n$、字母表为$\{0,1,\ldots,n^{\mathcal{O}(1)}\}$的只读字符串，在$b \ge n^\epsilon$（$\epsilon > 0$）条件下，利用$\mathcal{O}(b)$空间和$\mathcal{O}(\frac{1}{\epsilon}n)$时间构造规模为$\mathcal{O}(b)$、参数$\tau = \frac{n}{b}$的$\tau$-划分集变体。作为推论，当$b \ge n^\epsilon$且$\epsilon > 0$为常数时，我们为两类主要的小空间索引获得线性构造算法（在字符串存储基础上仅需$\mathcal{O}(b)$附加空间）：其一为稀疏后缀树（由$b$个选定后缀构建的压缩字典树），其二为最长公共扩展（LCE）索引（占用$\mathcal{O}(b)$空间，支持在$\mathcal{O}(n/b)$时间内计算任意两个子串的最长公共前缀）。对于两者而言，$\mathcal{O}(b)$的构造存储量渐近最优，因为树结构本身占用$\mathcal{O}(b)$空间，且任何查询时间为$\mathcal{O}(n/b)$的LCE索引必须至少占用$\mathcal{O}(b)$空间（由已知权衡限制，至少对$b \ge \Omega(n / \log n)$成立）。对于任意$b \ge \Omega(\log^2 n)$的情况，我们分别给出划分集、稀疏后缀树及LCE索引的构造算法，运行时间为$\mathcal{O}(n\log_b n)$，空间为$\mathcal{O}(b)$，从而改进了现有技术水平。