We study the complexity of constructing an optimal parsing $\varphi$ of a string ${\bf s} = s_1 \dots s_n$ under the constraint that given a position $p$ in the original text, and the LZ76-like (Lempel Ziv 76) encoding of $T$ based on $\varphi$, it is possible to identify/decompress the character $s_p$ by performing at most $c$ accesses to the LZ encoding, for a given integer $c.$ We refer to such a parsing $\varphi$ as a $c$-bounded access LZ parsing or $c$-BLZ parsing of ${\bf s}.$ We show that for any constant $c$ the problem of computing the optimal $c$-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX hard, i.e., no PTAS can exist under the standard complexity assumption $P \neq NP.$ We also study the ratio between the sizes of an optimal $c$-BLZ parsing of a string ${\bf s}$ and an optimal LZ76 parsing of ${\bf s}$ (which can be greedily computed in polynomial time).
翻译:我们研究在约束条件下构造字符串${\bf s} = s_1 \dots s_n$的最优解析$\varphi$的复杂度:给定原始文本中的位置$p$,以及基于$\varphi$的类LZ76(Lempel Ziv 76)编码$T$,可以通过最多$c$次对LZ编码的访问来识别/解压缩字符$s_p$,其中$c$为给定整数。我们将此类解析$\varphi$称为字符串${\bf s}$的$c$-有界访问LZ解析(或$c$-BLZ解析)。我们证明,对于任意常数$c$,计算字符串的最优$c$-BLZ解析(即短语数量最少的解析)的问题是NP难的,同时也是APX难的,即在标准复杂度假设$P \neq NP$下,不存在多项式时间近似方案(PTAS)。此外,我们还研究了字符串${\bf s}$的最优$c$-BLZ解析大小与最优LZ76解析大小(可通过贪心算法在多项式时间内计算)之间的比率。