The notions of synchronizing and partitioning sets are recently introduced variants of locally consistent parsings with great potential in problem-solving. In this paper we propose a deterministic algorithm that constructs for a given readonly string of length $n$ over the alphabet $\{0,1,\ldots,n^{\mathcal{O}(1)}\}$ a variant of $\tau$-partitioning set with size $\mathcal{O}(b)$ and $\tau = \frac{n}{b}$ using $\mathcal{O}(b)$ space and $\mathcal{O}(\frac{1}{\epsilon}n)$ time provided $b \ge n^\epsilon$, for $\epsilon > 0$. As a corollary, for $b \ge n^\epsilon$ and constant $\epsilon > 0$, we obtain linear construction algorithms with $\mathcal{O}(b)$ space on top of the string for two major small-space indexes: a sparse suffix tree, which is a compacted trie built on $b$ chosen suffixes of the string, and a longest common extension (LCE) index, which occupies $\mathcal{O}(b)$ space and allows us to compute the longest common prefix for any pair of substrings in $\mathcal{O}(n/b)$ time. For both, the $\mathcal{O}(b)$ construction storage is asymptotically optimal since the tree itself takes $\mathcal{O}(b)$ space and any LCE index with $\mathcal{O}(n/b)$ query time must occupy at least $\mathcal{O}(b)$ space by a known trade-off (at least for $b \ge \Omega(n / \log n)$). In case of arbitrary $b \ge \Omega(\log^2 n)$, we present construction algorithms for the partitioning set, sparse suffix tree, and LCE index with $\mathcal{O}(n\log_b n)$ running time and $\mathcal{O}(b)$ space, thus also improving the state of the art.
翻译:同步集与划分集是近期提出的局部一致解析变体,在问题求解中展现出巨大潜力。本文提出一种确定性算法,针对长度为$n$、字母表为$\{0,1,\ldots,n^{\mathcal{O}(1)}\}$的只读字符串,在$b \ge n^\epsilon$($\epsilon > 0$)条件下,利用$\mathcal{O}(b)$空间和$\mathcal{O}(\frac{1}{\epsilon}n)$时间构造规模为$\mathcal{O}(b)$、参数$\tau = \frac{n}{b}$的$\tau$-划分集变体。作为推论,当$b \ge n^\epsilon$且$\epsilon > 0$为常数时,我们为两类主要的小空间索引获得线性构造算法(在字符串存储基础上仅需$\mathcal{O}(b)$附加空间):其一为稀疏后缀树(由$b$个选定后缀构建的压缩字典树),其二为最长公共扩展(LCE)索引(占用$\mathcal{O}(b)$空间,支持在$\mathcal{O}(n/b)$时间内计算任意两个子串的最长公共前缀)。对于两者而言,$\mathcal{O}(b)$的构造存储量渐近最优,因为树结构本身占用$\mathcal{O}(b)$空间,且任何查询时间为$\mathcal{O}(n/b)$的LCE索引必须至少占用$\mathcal{O}(b)$空间(由已知权衡限制,至少对$b \ge \Omega(n / \log n)$成立)。对于任意$b \ge \Omega(\log^2 n)$的情况,我们分别给出划分集、稀疏后缀树及LCE索引的构造算法,运行时间为$\mathcal{O}(n\log_b n)$,空间为$\mathcal{O}(b)$,从而改进了现有技术水平。