Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. However, when the data are structured in a non-linear way, like in the context of two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In the paper we introduce extensions of repetitiveness measures to general two-dimensional strings. In particular, we propose a new extension of the measures $\delta$ and $\gamma$, diverging from previous square based definitions proposed in [Carfagna and Manzini, SPIRE 2023]. We further consider generalizations of macro schemes and straight line programs for the 2D setting and show that, in contrast to what happens on strings, 2D macro schemes and 2D SLPs can be both asymptotically smaller than $\delta$ and $\gamma$. The results of the paper can be easily extended to $d$-dimensional strings with $d > 2$.
翻译:检测和度量字符串的重复性是数据压缩和文本索引领域广泛研究的问题。然而,当数据以非线性方式组织,例如在二维字符串的背景下,其内在冗余性为压缩提供了丰富来源,但关于重复性度量的系统性研究仍显不足。本文引入了重复性度量在一般二维字符串上的扩展。具体而言,我们提出了度量指标$\delta$和$\gamma$的新扩展形式,有别于此前[Carfagna与Manzini,SPIRE 2023]中基于方形的定义。我们进一步考虑了二维场景下宏方案与直线程序的泛化,并证明与一维字符串不同,二维宏方案与二维SLP(直线程序)在渐近意义上均可能小于$\delta$和$\gamma$。本文结果可简便推广至$d$维字符串($d > 2$)。