The edit distance is a fundamental measure of sequence similarity, defined as the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. Given two strings of length at most $n$, simple dynamic programming computes their edit distance exactly in $O(n^2)$ time, which is also the best possible (up to subpolynomial factors) assuming the Strong Exponential Time Hypothesis (SETH). The last few decades have seen tremendous progress in edit distance approximation, where the runtime has been brought down to subquadratic, near-linear, and even sublinear at the cost of approximation. In this paper, we study the dynamic edit distance problem, where the strings change dynamically as the characters are substituted, inserted, or deleted over time. Each change may happen at any location of either of the two strings. The goal is to maintain the (exact or approximate) edit distance of such dynamic strings while minimizing the update time. The exact edit distance can be maintained in $\tilde{O}(n)$ time per update (Charalampopoulos, Kociumaka, Mozes; 2020), which is again tight assuming SETH. Unfortunately, even with the unprecedented progress in edit distance approximation in the static setting, strikingly little is known regarding dynamic edit distance approximation. Utilizing the off-the-shelf tools, it is possible to achieve an $O(n^{c})$-approximation in $n^{0.5-c+o(1)}$ update time for any constant $c\in [0,\frac16]$. Improving upon this trade-off remains open. The contribution of this work is a dynamic $n^{o(1)}$-approximation algorithm with amortized expected update time of $n^{o(1)}$. In other words, we bring the approximation-ratio and update-time product down to $n^{o(1)}$. Our solution utilizes an elegant framework of precision sampling tree for edit distance approximation (Andoni, Krauthgamer, Onak; 2010).
翻译:编辑距离是衡量序列相似度的基本度量,定义为将一个字符串转换为另一个字符串所需的最少字符插入、删除和替换操作次数。给定两个长度不超过$n$的字符串,简单动态规划算法可在$O(n^2)$时间内精确计算它们的编辑距离,而在强指数时间假设(SETH)下,这也是(至多亚多项式因子内)最佳可能的时间复杂度。过去几十年间,编辑距离近似算法取得了巨大进展,其运行时间已降至次二次、近线性甚至次线性,但以近似为代价。本文研究动态编辑距离问题,其中字符串随时间动态变化,字符被替换、插入或删除。每次变化可能发生在任一字符串的任意位置。目标是维护这些动态字符串的(精确或近似)编辑距离,同时最小化更新时间。精确编辑距离可在每次更新中通过$\tilde{O}(n)$时间维护(Charalampopoulos, Kociumaka, Mozes; 2020 ),这一结果在SETH假设下再次达到最优。然而,尽管静态场景下编辑距离近似取得了前所未有的进展,对于动态编辑距离近似的了解却显著不足。利用现成工具,对于任意常数$c\in [0,\frac16]$,可在$n^{0.5-c+o(1)}$更新时间内实现$O(n^{c})$-近似。改进这一权衡关系仍是一个开放问题。本文的贡献在于提出一种动态$n^{o(1)}$-近似算法,其均摊期望更新时间为$n^{o(1)}$。换言之,我们将近似比与更新时间的乘积降至$n^{o(1)}$。我们的解决方案利用了编辑距离近似的精度采样树优雅框架(Andoni, Krauthgamer, Onak; 2010 )。