LZ-End is a variant of the well-known Lempel-Ziv parsing family such that each phrase of the parsing has a previous occurrence, with the additional constraint that the previous occurrence must end at the end of a previous phrase. LZ-End was initially proposed as a greedy parsing, where each phrase is determined greedily from left to right, as the longest factor that satisfies the above constraint~[Kreft & Navarro, 2010]. In this work, we consider an optimal LZ-End parsing that has the minimum number of phrases in such parsings. We show that a decision version of computing the optimal LZ-End parsing is NP-complete by showing a reduction from the vertex cover problem. Moreover, we give a MAX-SAT formulation for the optimal LZ-End parsing adapting an approach for computing various NP-hard repetitiveness measures recently presented by [Bannai et al., 2022]. We also consider the approximation ratio of the size of greedy LZ-End parsing to the size of the optimal LZ-End parsing, and give a lower bound of the ratio which asymptotically approaches $2$.
翻译:LZ-End是著名的Lempel-Ziv解析家族的一种变体,其中解析的每个短语都有一个先前的出现,且附加约束为该先前出现必须结束于前一个短语的末尾。LZ-End最初被提出为一种贪心解析算法,即从左到右贪心地确定每个短语,作为满足上述约束的最长因子 [Kreft & Navarro, 2010]。在本工作中,我们考虑具有最少短语数量的最优LZ-End解析。通过从顶点覆盖问题进行归约,我们证明计算最优LZ-End解析的判定版本是NP完全的。此外,我们借鉴[Bannai et al., 2022]最近提出的用于计算多种NP困难重复性度量的方法,给出了最优LZ-End解析的MAX-SAT公式化表示。我们还考虑了贪心LZ-End解析规模与最优LZ-End解析规模之间的近似比,并给出了该比值的下界,该下界渐近地趋近于$2$。