A $k$-attractor is a combinatorial object unifying dictionary-based compression. It allows to compare the repetitiveness measures of different dictionary compressors such as Lempel-Ziv 77, the Burrows-Wheeler transform, straight line programs and macro schemes. For a string $T \in \Sigma^n$, the $k$-attractor is defined as a set of positions $\Gamma \subseteq [1,n]$, such that every distinct substring of length at most $k$ is covered by at least one of the selected positions. Thus, if a substring occurs multiple times in $T$, one position suffices to cover it. A 1-attractor is easily computed in linear time, while Kempa and Prezza [STOC 2018] have shown that for $k \geq 3$, it is NP-complete to compute the smallest $k$-attractor by a reduction from $k$-set cover. The main result of this paper answers the open question for the complexity of the 2-attractor problem, showing that the problem remains NP-complete. Kempa and Prezza's proof for $k \geq 3$ also reduces the 2-attractor problem to the 2-set cover problem, which is equivalent to edge cover, but that does not fully capture the complexity of the 2-attractor problem. For this reason, we extend edge cover by a color function on the edges, yielding the colorful edge cover problem. Any edge cover must then satisfy the additional constraint that each color is represented. This extension raises the complexity such that colorful edge cover becomes NP-complete while also more precisely modeling the 2-attractor problem. We obtain a reduction showing $k$-attractor to be NP-complete and APX-hard for any $k \geq 2$.
翻译:一个$k$-吸引子是统一基于字典压缩的组合对象,用于比较不同字典压缩器(如Lempel-Ziv 77、Burrows-Wheeler变换、直线程序与宏方案)的重复性度量。对于字符串$T \in \Sigma^n$,$k$-吸引子定义为位置集合$\Gamma \subseteq [1,n]$,使得每个长度不超过$k$的不同子串至少被所选位置之一覆盖。因此,若某个子串在$T$中多次出现,则一个位置即可覆盖它。1-吸引子可在线性时间内轻松计算,而Kempa与Prezza [STOC 2018] 通过$k$-集合覆盖的归约证明,当$k \geq 3$时,计算最小$k$-吸引子是NP完全的。本文主要结果回答了2-吸引子问题的复杂性这一开放问题,表明该问题仍为NP完全。Kempa与Prezza针对$k \geq 3$的证明也将2-吸引子问题归约为2-集合覆盖问题(等价于边覆盖问题),但该归约未能完全刻画2-吸引子问题的复杂性。为此,我们通过引入边上的颜色函数扩展边覆盖问题,得到彩色边覆盖问题。此时任何边覆盖必须满足每个颜色均被表示的附加约束。这一扩展使复杂性提升至彩色边覆盖成为NP完全问题,同时更精确地建模了2-吸引子问题。我们通过归约证明,对于任意$k \geq 2$,$k$-吸引子问题是NP完全且APX困难的。