Almost-Optimal Sublinear-Time Edit Distance in the Low Distance Regime

We revisit the task of computing the edit distance in sublinear time. In the $(k,K)$-gap edit distance problem the task is to distinguish whether the edit distance of two strings is at most $k$ or at least $K$. It has been established by Goldenberg, Krauthgamer and Saha (FOCS '19), with improvements by Kociumaka and Saha (FOCS '20), that the $(k,k^2)$-gap problem can be solved in time $\widetilde O(n/k+\operatorname{poly}(k))$. One of the most natural questions in this line of research is whether the $(k,k^2)$-gap is best-possible for the running time $\widetilde O(n/k+\operatorname{poly}(k))$. In this work we answer this question by significantly improving the gap. Specifically, we show that in time $O(n/k+\operatorname{poly}(k))$ we can even solve the $(k,k^{1+o(1)})$-gap problem. This is the first algorithm that breaks the $(k,k^2)$-gap in this running time. Our algorithm is almost optimal in the following sense: In the low distance regime ($k\le n^{0.19}$) our running time becomes $O(n/k)$, which matches a known $n/k^{1+o(1)}$ lower bound for the $(k,k^{1+o(1)})$-gap problem up to lower order factors. Our result also reveals a surprising similarity of Hamming distance and edit distance in the low distance regime: For both, the $(k,k^{1+o(1)})$-gap problem has time complexity $n/k^{1\pm o(1)}$ for small $k$. In contrast to previous work, which employed a subsampled variant of the Landau-Vishkin algorithm, we instead build upon the algorithm of Andoni, Krauthgamer and Onak (FOCS '10). We first simplify their approach and then show how to to effectively prune their computation tree in order to obtain a sublinear-time algorithm in the given time bound. Towards that, we use a variety of structural insights on the (local and global) patterns that can emerge during this process and design appropriate property testers to effectively detect these patterns.

翻译：我们重新审视亚线性时间计算编辑距离的任务。在$(k,K)$-间隙编辑距离问题中，目标是区分两个字符串的编辑距离是至多$k$还是至少$K$。Goldenberg、Krauthgamer和Saha (FOCS '19) 已确立，经Kociumaka和Saha (FOCS '20) 改进，$(k,k^2)$-间隙问题可在时间$\widetilde O(n/k+\operatorname{poly}(k))$内解决。该研究方向最自然的问题之一是：$(k,k^2)$-间隙对于运行时间$\widetilde O(n/k+\operatorname{poly}(k))$是否最优？本文通过显著改进间隙来回答此问题。具体而言，我们证明在时间$O(n/k+\operatorname{poly}(k))$内甚至可以解决$(k,k^{1+o(1)})$-间隙问题。这是首个在此运行时间内突破$(k,k^2)$-间隙的算法。我们的算法在以下意义上近乎最优：在低距离区域（$k\le n^{0.19}$）中，运行时间降至$O(n/k)$，这与已知的$(k,k^{1+o(1)})$-间隙问题下界$n/k^{1+o(1)}$仅相差低阶因子。我们的结果还揭示了低距离区域中汉明距离与编辑距离的惊人相似性：对于两者，$(k,k^{1+o(1)})$-间隙问题在小$k$时的复杂度均为$n/k^{1\pm o(1)}$。与先前采用Landau-Vishkin算法子采样变体的工作不同，我们转而基于Andoni、Krauthgamer和Onak (FOCS '10) 的算法。我们首先简化其方法，然后展示如何有效修剪其计算树，从而在给定时间界内获得亚线性时间算法。为此，我们利用过程中可能出现的（局部和全局）模式的各种结构洞察，并设计适当的性质测试器来有效检测这些模式。

相关内容

FOCS

关注 0

IEEE计算机科学基础研讨会（FOCS）是由IEEE计算机学会计算数学基础技术委员会（TCMF）主办的旗舰会议，涵盖了广泛的理论计算机科学。它每年秋季举行，并与每年春季举行的由ACM SIGACT赞助的姊妹会议——计算理论年度研讨会（STOC）配对。官网链接：http://ieee-focs.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日