A factorization $f_1, \ldots, f_m$ of a string $w$ of length $n$ is called a repetition factorization of $w$ if $f_i$ is a repetition, i.e., $f_i$ is a form of $x^kx'$, where $x$ is a non-empty string, $x'$ is a (possibly-empty) proper prefix of $x$, and $k \geq 2$. Dumitran et al. [SPIRE 2015] presented an $O(n)$-time and space algorithm for computing an arbitrary repetition factorization of a given string of length $n$. Their algorithm heavily relies on the Union-Find data structure on trees proposed by Gabow and Tarjan [JCSS 1985] that works in linear time on the word RAM model, and an interval stabbing data structure of Schmidt [ISAAC 2009]. In this paper, we explore more combinatorial insights into the problem, and present a simple algorithm to compute an arbitrary repetition factorization of a given string of length $n$ in $O(n)$ time, without relying on data structures for Union-Find and interval stabbing. Our algorithm follows the approach by Inoue et al. [ToCS 2022] that computes the smallest/largest repetition factorization in $O(n \log n)$ time.
翻译:字符串 $w$(长度为 $n$)的分解 $f_1, \ldots, f_m$ 被称为 $w$ 的重复分解,如果每个 $f_i$ 是一个重复,即 $f_i$ 具有 $x^kx'$ 的形式,其中 $x$ 是非空字符串,$x'$ 是 $x$ 的一个(可能为空的)真前缀,且 $k \geq 2$。Dumitran 等人 [SPIRE 2015] 提出了一种 $O(n)$ 时间和空间的算法,用于计算给定长度为 $n$ 的字符串的任意重复分解。他们的算法严重依赖于 Gabow 和 Tarjan [JCSS 1985] 提出的适用于树结构的并查集数据结构(该结构在字 RAM 模型上以线性时间运行),以及 Schmidt [ISAAC 2009] 的区间穿刺数据结构。在本文中,我们进一步探索了该问题的组合学特性,并提出了一种简单的算法,用于在 $O(n)$ 时间内计算给定长度为 $n$ 的字符串的任意重复分解,而无需依赖并查集和区间穿刺数据结构。我们的算法遵循了 Inoue 等人 [ToCS 2022] 的方法,该方法能够在 $O(n \log n)$ 时间内计算最小/最大重复分解。