A Straight-Line Program (SLP) $G$ for a string $T$ is a context-free grammar (CFG) that derives $T$ only, which can be considered as a compressed representation of $T$. In this paper, we show how to encode $G$ in $n \lceil \lg N \rceil + (n + n') \lceil \lg (n+σ) \rceil + 4n - 2n' + o(n)$ bits to support random access queries of extracting $T[p..q]$ in worst-case $O(\log N + q - p)$ time, where $N$ is the length of $T$, $σ$ is the alphabet size, $n$ is the number of variables in $G$ and $n' \le n$ is the number of symmetric centroid paths in the DAG representation for $G$. The time complexity is almost optimal because Verbin and Yu [CPM 2013] proved that $O(\log N)$ term cannot be significantly improved in general with $\mathrm{poly}(n)$-space data structures. We also present alternative encodings that achieve the same random access time with $n \lceil \lg N \rceil + n \lceil \lg (n+σ) \rceil + 5n + n' + o(n)$ or $n \lceil \lg N \rceil + n \lceil \lg (n+σ) \rceil + 5n - n' + σ+ o(n+σ)$ bits of space.
翻译:字符串$T$的直线式程序(SLP)$G$是仅推导出$T$的一种上下文无关文法(CFG),可视为$T$的压缩表示。本文提出一种对$G$进行编码的方法,其空间占用为$n \lceil \lg N \rceil + (n + n') \lceil \lg (n+σ) \rceil + 4n - 2n' + o(n)$比特,并支持在最坏情况$O(\log N + q - p)$时间内完成提取子串$T[p..q]$的随机访问查询,其中$N$为$T$的长度,$σ$为字母表大小,$n$为$G$中的变量个数,$n' \le n$为$G$的有向无环图表示中对称质心路径的数量。该时间复杂度近乎最优,因为Verbin与Yu[CPM 2013]已证明在一般条件下,对于具有$\mathrm{poly}(n)$空间的数据结构,$O(\log N)$项无法被显著改进。我们还提出了替代编码方案,在$n \lceil \lg N \rceil + n \lceil \lg (n+σ) \rceil + 5n + n' + o(n)$或$n \lceil \lg N \rceil + n \lceil \lg (n+σ) \rceil + 5n - n' + σ+ o(n+σ)$比特的空间占用下,实现相同的随机访问时间。