In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the $\gamma$ measure defined in terms of the smallest {string attractor}, and the $\delta$ measure defined in terms of the number of distinct substrings of the input string. Concretely, we introduce the two-dimensional measures $\gamma_{2D}$ and $\delta_{2D}$, as natural generalizations of $\gamma$ and $\delta$, and we initiate the study of their properties. Among other things, we prove that $\delta_{2D}$ is monotone and can be computed in linear time, and we show that, although it is still true that $\delta_{2D} \leq \gamma_{2D}$, the gap between the two measures can be $\Omega(\sqrt{n})$ for families of $n\times n$ matrices and therefore asymptotically larger than the gap between $\gamma$ and $\delta$. To complete the scenario of two-dimensional compressibility measures, we introduce the measure $b_{2D}$ which generalizes to two dimensions the notion of optimal parsing. We prove that, somewhat surprisingly, the relationship between $b_{2D}$ and $\gamma_{2D}$ is significantly different than in the one-dimensional case. As an application of our results we provide the first analysis of the space usage of the two-dimensional block tree introduced in [Brisaboa et al., Two-dimensional block trees, The computer Journal, 2023]. Our analysis shows that the space usage can be bounded in terms of both $\gamma_{2D}$ and $\delta_{2D}$ providing a theoretical justification for the use of this data structure. Finally, using insights from our analysis, we design the first linear time and space algorithm for constructing the two-dimensional block tree for arbitrary matrices. Our algorithm is asymptotically faster than the best known solution which is probabilistic and only works for binary matrices.
翻译:本文我们将最近提出的两种一维压缩性度量扩展到二维数据:基于最小{字符串吸引子}定义的γ度量,以及基于输入字符串不同子串数量定义的δ度量。具体而言,我们引入二维度量γ_{2D}和δ_{2D}作为γ与δ的自然推广,并开创性地研究其性质。我们证明:δ_{2D}具有单调性且可在线性时间内计算;同时显示,尽管δ_{2D} ≤ γ_{2D}仍然成立,但两个度量之间的差距对于n×n矩阵族可达Ω(√n),因此渐近大于γ与δ之间的差距。为完善二维压缩性度量体系,我们引入度量b_{2D},将最优解析概念推广至二维。令人惊讶的是,我们证明b_{2D}与γ_{2D}的关系与一维情况存在显著差异。作为应用,我们首次分析了文献[Brisaboa等,二维块树,《计算机杂志》,2023]中提出的二维块树空间占用情况。分析表明其空间占用可用γ_{2D}和δ_{2D}进行约束,为该数据结构的使用提供了理论依据。最后,基于分析洞察,我们设计了首个针对任意矩阵的二维块树线性时间空间构建算法。该算法渐近快于现有最佳概率算法(仅适用于二元矩阵)。