We study the impact that string reversal can have on several repetitiveness measures. First, we exhibit an infinite family of strings where the number, $r$, of runs in the run-length encoding of the Burrows--Wheeler transform (BWT) can increase additively by $Θ(n)$ when reversing the string. This substantially improves the known $Ω(\log n)$ lower-bound for the additive sensitivity of $r$ and it is asymptotically tight. We generalize our result to other variants of the BWT, including the variant with an appended end-of-string symbol and the bijective BWT. We show that an analogous result holds for the size $z$ of the Lempel--Ziv 77 (LZ) parsing of the text, and also for some of its variants, including the non-overlapping LZ parsing, and the LZ-end parsing. Moreover, we describe a family of strings for which the ratio $z(w^R)/z(w)$ approaches $3$ from below as $|w|\rightarrow \infty$. We also show an asymptotically tight lower-bound of $Θ(n)$ for the additive sensitivity of the size $v$ of the smallest lexicographic parsing to string reversal. Finally, we show that the multiplicative sensitivity of $v$ to reversing the string is $Θ(\log n)$, and this lower-bound is also tight. Overall, our results expose the limitations of repetitiveness measures that are widely used in practice, against string reversal -- a simple and natural data transformation.
翻译:我们研究了字符串反转对多种重复性度量的影响。首先,我们构造了一个无限字符串族,其中Burrows–Wheeler变换(BWT)游程编码的游程数$r$在字符串反转时可能产生$\Theta(n)$的加性增长。这显著改进了已知的$r$加性敏感性的$\Omega(\log n)$下界,并且该结果是渐近紧的。我们将此结果推广到BWT的其他变体,包括附加了字符串结束符号的变体以及双射BWT。我们证明了类似结果对于文本的Lempel–Ziv 77(LZ)解析大小$z$及其某些变体同样成立,包括非重叠LZ解析和LZ-end解析。此外,我们描述了一个字符串族,其中当$|w|\rightarrow \infty$时,比值$z(w^R)/z(w)$从下方趋近于$3$。我们还证明了最小字典序解析大小$v$对字符串反转的加性敏感性具有渐近紧的$\Theta(n)$下界。最后,我们证明了$v$对字符串反转的乘性敏感性为$\Theta(\log n)$,且该下界也是紧的。总体而言,我们的结果揭示了实践中广泛使用的重复性度量对于字符串反转——一种简单自然的数据变换——存在的局限性。