String 2-Covers with No Length Restrictions

A $\lambda$-cover of a string $S$ is a set of strings $\{C_i\}_1^\lambda$ such that every index in $S$ is contained in an occurrence of at least one string $C_i$. The existence of a $1$-cover defines a well-known class of quasi-periodic strings. Quasi-periodicity can be decided in linear time, and all $1$-covers of a string can be reported in linear time plus the size of the output. Since in general it is NP-complete to decide whether a string has a $\lambda$-cover, the natural next step is the development of efficient algorithms for $2$-covers. Radoszewski and Straszy\'nski [ESA 2020] analysed the particular case where the strings in a $2$-cover must be of the same length. They provided an algorithm that reports all such $2$-covers of $S$ in time near-linear in $|S|$ and in the size of the output. In this work, we consider $2$-covers in full generality. Since every length-$n$ string has $\Omega(n^2)$ trivial $2$-covers (every prefix and suffix of total length at least $n$ constitute such a $2$-cover), we state the reporting problem as follows: given a string $S$ and a number $m$, report all $2$-covers $\{C_1,C_2\}$ of $S$ with length $|C_1|+|C_2|$ upper bounded by $m$. We present an $\tilde{O}(n + Output)$ time algorithm solving this problem, with Output being the size of the output. This algorithm admits a simpler modification that finds a $2$-cover of minimum length. We also provide an $\tilde{O}(n)$ time construction of a $2$-cover oracle which, given two substrings $C_1,C_2$ of $S$, reports in poly-logarithmic time whether $\{C_1,C_2\}$ is a $2$-cover of $S$.

翻译：字符串$S$的$\lambda$-覆盖是一组字符串$\{C_i\}_1^\lambda$，使得$S$中的每个索引至少包含在一个字符串$C_i$的出现中。存在$1$-覆盖定义了著名的准周期字符串类。准周期性可在线性时间内判定，并且字符串的所有$1$-覆盖可在线性时间（加上输出规模）内报告。由于一般情况下判定字符串是否存在$\lambda$-覆盖是NP完全的，自然的下一个步骤是开发针对$2$-覆盖的高效算法。Radoszewski和Straszyński [ESA 2020] 分析了$2$-覆盖中字符串必须等长的特殊情形。他们提出了一种算法，能在接近$|S|$线性时间及输出规模内报告$S$的所有此类$2$-覆盖。在本工作中，我们考虑完全一般性的$2$-覆盖。由于每个长度为$n$的字符串都有$\Omega(n^2)$个平凡的$2$-覆盖（所有总长度至少为$n$的前缀和后缀均构成这样的$2$-覆盖），我们将报告问题表述如下：给定字符串$S$和数值$m$，报告$S$的所有满足长度$|C_1|+|C_2|$上界为$m$的$2$-覆盖$\{C_1,C_2\}$。我们提出了一种$\tilde{O}(n + Output)$时间的算法来解决此问题，其中Output为输出规模。该算法可通过简单修改得到寻找最小长度$2$-覆盖的算法。我们还提供了$\tilde{O}(n)$时间的$2$-覆盖预计算结构构造算法，该结构能在多对数时间内回答任意两个子串$C_1,C_2$是否构成$S$的$2$-覆盖。