The Lempel--Ziv 78 (LZ78) factorization is a well-studied technique for data compression. It and its derivatives are used in compression formats such as "compress" or "gif". Although most research focuses on the factorization of plain data, not much research has been conducted on indexing the data for fast LZ78 factorization. Here, we study the LZ78 factorization and its derivatives in the substring compression model, where we are allowed to index the data and return the factorization of a substring specified at query time. In that model, we propose an algorithm that works in compressed space, computing the factorization with a logarithmic slowdown compared to the optimal time complexity.
翻译:Lempel--Ziv 78(LZ78)分解是一种经过深入研究的数据压缩技术。它及其衍生算法被广泛应用于“compress”或“gif”等压缩格式中。尽管大多数研究聚焦于原始数据的分解,但针对快速LZ78分解的数据索引研究却相对有限。本文在子串压缩模型中研究LZ78分解及其衍生算法,该模型允许对数据进行索引并在查询时返回指定子串的分解结果。在此模型下,我们提出一种在压缩空间内运行的算法,其计算分解的时间复杂度相较于最优时间复杂度仅存在对数级放缓。