The deletion distance between two binary words $u,v \in \{0,1\}^n$ is the smallest $k$ such that $u$ and $v$ share a common subsequence of length $n-k$. A set $C$ of binary words of length $n$ is called a $k$-deletion code if every pair of distinct words in $C$ has deletion distance greater than $k$. In 1965, Levenshtein initiated the study of deletion codes by showing that, for $k\ge 1$ fixed and $n$ going to infinity, a $k$-deletion code $C\subseteq \{0,1\}^n$ of maximum size satisfies $\Omega_k(2^n/n^{2k}) \leq |C| \leq O_k( 2^n/n^k)$. We make the first asymptotic improvement to these bounds by showing that there exist $k$-deletion codes with size at least $\Omega_k(2^n \log n/n^{2k})$. Our proof is inspired by Jiang and Vardy's improvement to the classical Gilbert--Varshamov bounds. We also establish several related results on the number of longest common subsequences and shortest common supersequences of a pair of words with given length and deletion distance.
翻译:两个二进制单词 $u,v \in \{0,1\}^n$ 之间的删除距离定义为最小的 $k$,使得 $u$ 和 $v$ 存在一个长度为 $n-k$ 的公共子序列。若集合 $C$ 中任意两个不同的单词的删除距离均大于 $k$,则称 $C$ 为 $k$-删除码。1965年,Levenshtein 开创了删除码的研究,证明了:对于固定的 $k\ge 1$ 且 $n$ 趋于无穷时,最大规模的 $k$-删除码 $C\subseteq \{0,1\}^n$ 满足 $\Omega_k(2^n/n^{2k}) \leq |C| \leq O_k( 2^n/n^k)$。我们首次对这些界做出了渐近改进,证明了存在规模至少为 $\Omega_k(2^n \log n/n^{2k})$ 的 $k$-删除码。证明思路受 Jiang 和 Vardy 对经典 Gilbert–Varshamov 界的改进启发。此外,我们还建立了关于给定长度及删除距离的一对单词的最长公共子序列和最短公共超序列数量的若干相关结果。