We study the fundamental problem of finding the best string to represent a given set, in the form of the Closest String problem: Given a set $X \subseteq \Sigma^d$ of $n$ strings, find the string $x^*$ minimizing the radius of the smallest Hamming ball around $x^*$ that encloses all the strings in $X$. In this paper, we investigate whether the Closest String problem admits algorithms that are faster than the trivial exhaustive search algorithm. We obtain the following results for the two natural versions of the problem: $\bullet$ In the continuous Closest String problem, the goal is to find the solution string $x^*$ anywhere in $\Sigma^d$. For binary strings, the exhaustive search algorithm runs in time $O(2^d poly(nd))$ and we prove that it cannot be improved to time $O(2^{(1-\epsilon) d} poly(nd))$, for any $\epsilon > 0$, unless the Strong Exponential Time Hypothesis fails. $\bullet$ In the discrete Closest String problem, $x^*$ is required to be in the input set $X$. While this problem is clearly in polynomial time, its fine-grained complexity has been pinpointed to be quadratic time $n^{2 \pm o(1)}$ whenever the dimension is $\omega(\log n) < d < n^{o(1)}$. We complement this known hardness result with new algorithms, proving essentially that whenever $d$ falls out of this hard range, the discrete Closest String problem can be solved faster than exhaustive search. In the small-$d$ regime, our algorithm is based on a novel application of the inclusion-exclusion principle. Interestingly, all of our results apply (and some are even stronger) to the natural dual of the Closest String problem, called the \emph{Remotest String} problem, where the task is to find a string maximizing the Hamming distance to all the strings in $X$.
翻译:我们研究了以最近字符串问题形式寻找表示给定集合的最佳字符串这一基本问题:给定一个由 $n$ 个字符串组成的集合 $X \subseteq \Sigma^d$,找到字符串 $x^*$,使得包含 $X$ 中所有字符串的最小汉明球半径最小化。在本文中,我们探讨了最近字符串问题是否存在比平凡穷举搜索算法更快的算法。对于该问题的两种自然版本,我们得到以下结果:$\bullet$ 在连续最近字符串问题中,目标是在 $\Sigma^d$ 中任意位置找到解字符串 $x^*$。对于二进制字符串,穷举搜索算法的时间复杂度为 $O(2^d poly(nd))$,我们证明对于任意 $\epsilon > 0$,除非强指数时间假设不成立,否则该算法无法改进到 $O(2^{(1-\epsilon) d} poly(nd))$。$\bullet$ 在离散最近字符串问题中,$x^*$ 必须属于输入集合 $X$。虽然该问题显然可以在多项式时间内解决,但其细粒度复杂度已被确定为平方时间 $n^{2 \pm o(1)}$,当维度满足 $\omega(\log n) < d < n^{o(1)}$ 时。我们通过新算法补充了这一已知的困难结果,证明了只要 $d$ 超出该困难范围,离散最近字符串问题就能比穷举搜索更快地求解。在小 $d$ 情况下,我们的算法基于容斥原理的新颖应用。有趣的是,我们所有的结果都适用于(且有些甚至更强于)最近字符串问题的自然对偶问题,称为最远字符串问题,其任务是找到一个字符串,使其与 $X$ 中所有字符串的汉明距离最大化。