We study the fundamental problem of finding the best string to represent a given set, in the form of the Closest String problem: Given a set $X \subseteq \Sigma^d$ of $n$ strings, find the string $x^*$ minimizing the radius of the smallest Hamming ball around $x^*$ that encloses all the strings in $X$. In this paper, we investigate whether the Closest String problem admits algorithms that are faster than the trivial exhaustive search algorithm. We obtain the following results for the two natural versions of the problem: $\bullet$ In the continuous Closest String problem, the goal is to find the solution string $x^*$ anywhere in $\Sigma^d$. For binary strings, the exhaustive search algorithm runs in time $O(2^d poly(nd))$ and we prove that it cannot be improved to time $O(2^{(1-\epsilon) d} poly(nd))$, for any $\epsilon > 0$, unless the Strong Exponential Time Hypothesis fails. $\bullet$ In the discrete Closest String problem, $x^*$ is required to be in the input set $X$. While this problem is clearly in polynomial time, its fine-grained complexity has been pinpointed to be quadratic time $n^{2 \pm o(1)}$ whenever the dimension is $\omega(\log n) < d < n^{o(1)}$. We complement this known hardness result with new algorithms, proving essentially that whenever $d$ falls out of this hard range, the discrete Closest String problem can be solved faster than exhaustive search. In the small-$d$ regime, our algorithm is based on a novel application of the inclusion-exclusion principle. Interestingly, all of our results apply (and some are even stronger) to the natural dual of the Closest String problem, called the Remotest String problem, where the task is to find a string maximizing the Hamming distance to all the strings in $X$.
翻译:我们研究了寻找最能代表给定集合的字符串的基本问题,即最近字符串问题:给定一个包含 $n$ 个字符串的集合 $X \subseteq \Sigma^d$,找到字符串 $x^*$,使得包含 $X$ 中所有字符串的最小汉明球半径最小。本文探讨了最近字符串问题是否存在比朴素穷举搜索算法更快的算法。对于该问题的两个自然变体,我们获得了以下结果:$\bullet$ 在连续最近字符串问题中,目标是找到位于 $\Sigma^d$ 中任意位置的解字符串 $x^*$。对于二进制字符串,穷举搜索算法的时间复杂度为 $O(2^d poly(nd))$,我们证明,除非强指数时间假设不成立,否则对于任意 $\epsilon > 0$,该复杂度无法改进到 $O(2^{(1-\epsilon) d} poly(nd))$。$\bullet$ 在离散最近字符串问题中,要求 $x^*$ 属于输入集合 $X$。尽管该问题显然是多项式时间可解的,但其细粒度复杂度被确定为二次时间 $n^{2 \pm o(1)}$,前提是维度满足 $\omega(\log n) < d < n^{o(1)}$。我们通过新算法补充了这一已知的困难性结果,本质上证明了只要 $d$ 超出此困难范围,离散最近字符串问题就能比穷举搜索更快地解决。在小 $d$ 情形下,我们的算法基于对包含-排除原理的创新应用。有趣的是,我们所有结果都适用于(甚至在某些情况下更强于)最近字符串问题的自然对偶问题,即最远字符串问题,其任务是从 $X$ 中找到一个字符串,使其与所有字符串的汉明距离最大化。