Repetitiveness measures quantify how much repetitive structure a string contains and serve as parameters for compressed representations and indexing data structures. We study the measure $χ$, defined as the size of the smallest suffixient set. Although $χ$ has been studied extensively, its reachability, whether every string $w$ admits a string representation of size $O(χ(w))$ words, has remained an important open problem. We answer this question affirmatively by presenting the first such representation scheme. Our construction is based on a new model, the substring equation system (SES), and we show that every string admits an SES of size $O(χ(w))$.
翻译:重复性度量用于量化字符串中包含的重复结构程度,并作为压缩表示和索引数据结构的关键参数。本文研究度量指标$χ$,其定义为最小后缀集(suffixient set)的大小。尽管$χ$已被广泛研究,但其可达性——即每个字符串$w$是否均存在大小为$O(χ(w))$个单词的字符串表示——仍是一个重要的开放问题。我们通过提出首个此类表示方案,对该问题给出肯定回答。该方案基于一种新模型——子串方程系统(SES),并证明每个字符串均存在大小为$O(χ(w))$的SES。