The edit distance of two strings is the minimum number of insertions, deletions, and substitutions of characters needed to transform one string into the other. The textbook dynamic-programming algorithm computes the edit distance of two length-$n$ strings in $O(n^2)$ time, which is optimal up to subpolynomial factors under SETH. An established way of circumventing this hardness is to consider the bounded setting, where the running time is parameterized by the edit distance $k$. A celebrated algorithm by Landau and Vishkin (JCSS '88) achieves time $O(n + k^2)$, which is optimal as a function of $n$ and $k$. Most practical applications rely on a more general weighted edit distance, where each edit has a weight depending on its type and the involved characters from the alphabet $\Sigma$. This is formalized through a weight function $w : \Sigma\cup\{\varepsilon\}\times\Sigma\cup\{\varepsilon\}\to\mathbb{R}$ normalized so that $w(a,a)=0$ and $w(a,b)\geq 1$ for all $a,b \in \Sigma\cup\{\varepsilon\}$ with $a \neq b$; the goal is to find an alignment of the two strings minimizing the total weight of edits. The $O(n^2)$-time algorithm supports this setting seamlessly, but only very recently, Das, Gilbert, Hajiaghayi, Kociumaka, and Saha (STOC '23) gave the first non-trivial algorithm for the bounded version, achieving time $O(n + k^5)$. While this running time is linear for $k\le n^{1/5}$, it is still very far from the bound $O(n+k^2)$ achievable in the unweighted setting. In this paper, we essentially close this gap by showing both an improved $\tilde O(n+\sqrt{nk^3})$-time algorithm and, more surprisingly, a matching lower bound: Conditioned on the All-Pairs Shortest Paths (APSP) hypothesis, our running time is optimal for $\sqrt{n}\le k\le n$ (up to subpolynomial factors). This is the first separation between the complexity of the weighted and unweighted edit distance problems.
翻译:两个字符串的编辑距离是将一个字符串转换为另一个字符串所需的最少插入、删除和替换字符操作次数。经典动态规划算法可在 $O(n^2)$ 时间内计算两个长度为 $n$ 的字符串的编辑距离,在 SETH 假设下,该复杂度在次多项式因子内是最优的。绕过这一难度的既定方法是考虑有界情形,其中运行时间由编辑距离 $k$ 参数化。Landau 和 Vishkin (JCSS '88) 的著名算法实现了 $O(n + k^2)$ 的时间复杂度,作为 $n$ 和 $k$ 的函数而言是最优的。大多数实际应用依赖于更通用的加权编辑距离,其中每次编辑的权重取决于其类型以及涉及字母表 $\Sigma$ 中的字符。这通过权重函数 $w : \Sigma\cup\{\varepsilon\}\times\Sigma\cup\{\varepsilon\}\to\mathbb{R}$ 形式化,且归一化为对于所有 $a,b \in \Sigma\cup\{\varepsilon\}$ 且 $a \neq b$ 有 $w(a,a)=0$ 和 $w(a,b)\geq 1$;目标是找到两个字符串的对齐方式,使得编辑的总权重最小。$O(n^2)$ 时间的算法无缝支持这一设定,但直到最近,Das、Gilbert、Hajiaghayi、Kociumaka 和 Saha (STOC '23) 才首次给出了有界版本的非平凡算法,实现了 $O(n + k^5)$ 的时间复杂度。虽然当 $k\le n^{1/5}$ 时该运行时间是线性的,但它仍然远未达到无权重情形下可实现的 $O(n+k^2)$ 界限。在本文中,我们通过展示改进的 $\tilde O(n+\sqrt{nk^3})$ 时间算法,以及更为惊人的匹配下界,实质上弥合了这一差距:在 All-Pairs Shortest Paths (APSP) 假设下,对于 $\sqrt{n}\le k\le n$,我们的运行时间是最优的(在次多项式因子内)。这是加权与无权重编辑距离问题复杂度之间的首次区分。