The Longest Common Subsequence (LCS) is a fundamental string similarity measure, and computing the LCS of two strings is a classic algorithms question. A textbook dynamic programming algorithm gives an exact algorithm in quadratic time, and this is essentially best possible under plausible fine-grained complexity assumptions, so a natural problem is to find faster approximation algorithms. When the inputs are two binary strings, there is a simple $\frac{1}{2}$-approximation in linear time: compute the longest common all-0s or all-1s subsequence. It has been open whether a better approximation is possible even in truly subquadratic time. Rubinstein and Song showed that the answer is yes under the assumption that the two input strings have equal lengths. We settle the question, generalizing their result to unequal length strings, proving that, for any $\varepsilon>0$, there exists $\delta>0$ and a $(\frac{1}{2}+\delta)$-approximation algorithm for binary LCS that runs in $n^{1+\varepsilon}$ time. As a consequence of our result and a result of Akmal and Vassilevska-Williams, for any $\varepsilon>0$, there exists a $(\frac{1}{q}+\delta)$-approximation for LCS over $q$-ary strings in $n^{1+\varepsilon}$ time. Our techniques build on the recent work of Guruswami, He, and Li who proved new bounds for error-correcting codes tolerating deletion errors. They prove a combinatorial "structure lemma" for strings which classifies them according to their oscillation patterns. We prove and use an algorithmic generalization of this structure lemma, which may be of independent interest.
翻译:最长公共子序列(LCS)是一种基本的字符串相似性度量,计算两个字符串的LCS是一个经典算法问题。课本中的动态规划算法给出了二次时间内的精确算法,而在合理的细粒度复杂性假设下,这本质上是最优结果,因此自然问题在于寻找更快的近似算法。当输入为二进制字符串时,存在一个简单的线性时间$\frac{1}{2}$近似算法:计算最长的全0或全1公共子序列。此前尚未明确是否能在真正的次二次时间内实现更优近似。Rubinstein和Song的研究表明,在假设两个输入字符串长度相等的前提下,答案是肯定的。我们解答了该问题,将其结论推广至不等长字符串,证明:对于任意$\varepsilon>0$,存在$\delta>0$及一个运行时间为$n^{1+\varepsilon}$的$(\frac{1}{2}+\delta)$近似算法用于二进制LCS。结合我们的结果与Akmal及Vassilevska-Williams的结论,对于任意$\varepsilon>0$,存在一个运行时间为$n^{1+\varepsilon}$的$(\frac{1}{q}+\delta)$近似算法用于$q$元字符串的LCS。我们的技术基于Guruswami、He和Li近期的工作,他们针对容忍删除错误的纠错码证明了新的界限。他们提出了一个面向字符串的组合性“结构引理”,根据字符串的振荡模式对其进行分类。我们证明并使用了该结构引理的算法化推广,这一推广可能具有独立的研究价值。