Random access to highly compressed strings -- represented by straight-line programs or Lempel-Ziv parses, for example -- is a well-studied topic. Random access to such strings in strongly sublogarithmic time is impossible in the worst case, but previous authors have shown how to support faster access to specific characters and their neighbourhoods. In this paper we explore whether, since better compression can impede access, we can support faster access to relatively incompressible substrings of highly compressed strings. We first show how, given a run-length compressed straight-line program (RLSLP) of size $g_{rl}$ or a block tree of size $L$, we can build an $O (g_{rl})$-space or an $O (L)$-space data structure, respectively, that supports access to any character in time logarithmic in the length of the longest repeated substring containing that character. That is, the more incongruous a character is with respect to the characters around it in a certain sense, the faster we can support access to it. We then prove a similar but more powerful and sophisticated result for parsings in which phrases' sources do not overlap much larger phrases, with the query time depending also on the number of phrases we must copy from their sources to obtain the queried character.
翻译:对高度压缩字符串(例如,由直线式程序或Lempel-Ziv解析表示)的随机访问是一个被深入研究的话题。在最坏情况下,以强次对数时间对此类字符串进行随机访问是不可能的,但先前的研究者已展示了如何支持对特定字符及其邻域的更快访问。本文探讨了既然更好的压缩可能阻碍访问,我们是否能够支持对高度压缩字符串中相对不可压缩子串的更快访问。我们首先展示了给定一个大小为 $g_{rl}$ 的游程编码直线式程序(RLSLP)或一个大小为 $L$ 的块树,如何分别构建一个 $O (g_{rl})$ 空间或 $O (L)$ 空间的数据结构,该结构支持在包含该字符的最长重复子串长度的对数时间内访问任何字符。也就是说,在某种意义上,一个字符与其周围字符的非一致性越强,我们支持访问它的速度就越快。然后,我们针对短语的源不重叠于更大短语的解析,证明了一个类似但更强大和复杂的结果,其查询时间还取决于为获取查询字符而必须从其源复制的短语数量。