Grammar-based compression is a widely-accepted model of string compression that allows for efficient and direct manipulations on the compressed data. Most, if not all, such manipulations rely on the primitive \emph{random access} queries, a task of quickly returning the character at a specified position of the original uncompressed string without explicit decompression. While there are advanced data structures for random access to grammar-compressed strings that guarantee theoretical query time and space bounds, little has been done for the \emph{practical} perspective of this important problem. In this paper, we revisit a well-known folklore random access algorithm for grammars in the Chomsky normal form, modify it to work directly on general grammars, and show that this modified version is fast and memory efficient in practice.
翻译:基于语法的压缩是一种被广泛接受的字符串压缩模型,它允许对压缩数据进行高效且直接的操作。大多数(若非全部)此类操作依赖于基本原语——随机访问查询,即在不显式解压缩的情况下快速返回原始未压缩字符串中指定位置的字符。尽管已有针对语法压缩字符串的随机访问高级数据结构能够保证理论查询时间和空间界限,但针对这一重要问题的实践视角却鲜有研究。本文重新审视了针对乔姆斯基范式语法的一种著名民间随机访问算法,将其修改为可直接处理一般语法,并证明该修改版本在实践中具有快速且内存高效的特点。