The Cover Suffix Tree (CST) of a string $T$ is the suffix tree of $T$ with additional explicit nodes corresponding to halves of square substrings of $T$. In the CST an explicit node corresponding to a substring $C$ of $T$ is annotated with two numbers: the number of non-overlapping consecutive occurrences of $C$ and the total number of positions in $T$ that are covered by occurrences of $C$ in $T$. Kociumaka et al. (Algorithmica, 2015) have shown how to compute the CST of a length-$n$ string in $O(n \log n)$ time. We show how to compute the CST in $O(n)$ time assuming that $T$ is over an integer alphabet. Kociumaka et al. (Algorithmica, 2015; Theor. Comput. Sci., 2018) have shown that knowing the CST of a length-$n$ string $T$, one can compute a linear-sized representation of all seeds of $T$ as well as all shortest $\alpha$-partial covers and seeds in $T$ for a given $\alpha$ in $O(n)$ time. Thus our result implies linear-time algorithms computing these notions of quasiperiodicity. The resulting algorithm computing seeds is substantially different from the previous one (Kociumaka et al., SODA 2012, ACM Trans. Algorithms, 2020). Kociumaka et al. (Algorithmica, 2015) proposed an $O(n \log n)$-time algorithm for computing a shortest $\alpha$-partial cover for each $\alpha=1,\ldots,n$; we improve this complexity to $O(n)$. Our results are based on a new characterization of consecutive overlapping occurrences of a substring $S$ of $T$ in terms of the set of runs (see Kolpakov and Kucherov, FOCS 1999) in $T$. This new insight also leads to an $O(n)$-sized index for reporting overlapping consecutive occurrences of a given pattern $P$ of length $m$ in $O(m+output)$ time, where $output$ is the number of occurrences reported. In comparison, a general index for reporting bounded-gap consecutive occurrences of Navarro and Thankachan (Theor. Comput. Sci., 2016) uses $O(n \log n)$ space.
翻译:字符串 $T$ 的覆盖后缀树(CST)是 $T$ 的后缀树,并额外添加了对应于 $T$ 中平方子串一半的显式节点。在 CST 中,对应于 $T$ 的子串 $C$ 的显式节点标注有两个数值:$C$ 的非重叠连续出现次数,以及 $T$ 中被 $C$ 的出现所覆盖的位置总数。Kociumaka 等人(Algorithmica, 2015)已展示了如何在 $O(n \log n)$ 时间内计算长度为 $n$ 的字符串的 CST。我们展示了如何在 $O(n)$ 时间内计算 CST,前提是 $T$ 来自整数字母表。Kociumaka 等人(Algorithmica, 2015;Theor. Comput. Sci., 2018)指出,已知长度 $n$ 的字符串 $T$ 的 CST,可在 $O(n)$ 时间内计算 $T$ 的所有种子(seeds)的线性规模表示,以及 $T$ 中给定 $\alpha$ 的所有最短 $\alpha$-部分覆盖和种子。因此,我们的结果意味着存在线性时间算法来计算这些准周期概念。所得出的计算种子的算法与先前算法(Kociumaka 等人,SODA 2012, ACM Trans. Algorithms, 2020)有本质不同。Kociumaka 等人(Algorithmica, 2015)提出了一个 $O(n \log n)$ 时间的算法,用于计算每个 $\alpha=1,\ldots,n$ 的最短 $\alpha$-部分覆盖;我们将其复杂度改进为 $O(n)$。我们的结果基于对 $T$ 的子串 $S$ 的连续重叠出现的一种新刻画,该刻画依赖于 $T$ 中的 run 集合(参见 Kolpakov 和 Kucherov, FOCS 1999)。这一新见解还导致了一个 $O(n)$ 规模的索引,用于在 $O(m+output)$ 时间内报告给定模式 $P$(长度为 $m$)的重叠连续出现,其中 $output$ 是报告的出现次数。相比之下,Navarro 和 Thankachan(Theor. Comput. Sci., 2016)用于报告有界间隔连续出现的通用索引使用了 $O(n \log n)$ 空间。