Sublinear time quantum algorithms have been established for many fundamental problems on strings. This work demonstrates that new, faster quantum algorithms can be designed when the string is highly compressible. We focus on two popular and theoretically significant compression algorithms -- the Lempel-Ziv77 algorithm (LZ77) and the Run-length-encoded Burrows-Wheeler Transform (RL-BWT), and obtain the results below. We first provide a quantum algorithm running in $\tilde{O}(\sqrt{zn})$ time for finding the LZ77 factorization of an input string $T[1..n]$ with $z$ factors. Combined with multiple existing results, this yields an $\tilde{O}(\sqrt{rn})$ time quantum algorithm for finding the RL-BWT encoding with $r$ BWT runs. Note that $r = \tilde{\Theta}(z)$. We complement these results with lower bounds proving that our algorithms are optimal (up to polylog factors). Next, we study the problem of compressed indexing, where we provide a $\tilde{O}(\sqrt{rn})$ time quantum algorithm for constructing a recently designed $\tilde{O}(r)$ space structure with equivalent capabilities as the suffix tree. This data structure is then applied to numerous problems to obtain sublinear time quantum algorithms when the input is highly compressible. For example, we show that the longest common substring of two strings of total length $n$ can be computed in $\tilde{O}(\sqrt{zn})$ time, where $z$ is the number of factors in the LZ77 factorization of their concatenation. This beats the best known $\tilde{O}(n^\frac{2}{3})$ time quantum algorithm when $z$ is sufficiently small.
翻译:亚线性时间量子算法已在字符串领域的诸多基本问题中建立。本研究证明,当字符串具有高度可压缩性时,可以设计出更快速的量子算法。我们聚焦于两种在理论意义上重要的流行压缩算法——LZ77算法和游程编码Burrows-Wheeler变换(RL-BWT),并取得以下成果。首先,我们提出一种运行时间为$\tilde{O}(\sqrt{zn})$的量子算法,用于计算输入字符串$T[1..n]$的LZ77分解(其中$z$为因子数)。结合多项现有结果,该算法进一步导出了$\tilde{O}(\sqrt{rn})$时间的量子算法,用于计算具有$r$个BWT游程的RL-BWT编码。值得注意的是$r = \tilde{\Theta}(z)$。我们通过下界证明补充这些结果,证实我们的算法在(多对数因子意义下)是最优的。其次,我们研究压缩索引问题,提出一种$\tilde{O}(\sqrt{rn})$时间的量子算法,用于构建最近设计的、具有与后缀树等效能力的$\tilde{O}(r)$空间数据结构。该数据结构被应用于多个问题,从而在输入高度可压缩时获得亚线性时间量子算法。例如,我们证明总长度为$n$的两个字符串的最长公共子串可在$\tilde{O}(\sqrt{zn})$时间内计算,其中$z$为它们拼接后LZ77分解的因子数。当$z$足够小时,该结果超越了现有最优的$\tilde{O}(n^\frac{2}{3})$时间量子算法。