Consider two or more strings $\mathbf{x}^1,\mathbf{x}^2,\ldots,$ that are concatenated to form $\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle$. Suppose that up to $\delta$ deletions occur in each of the concatenated strings. Since deletions alter the lengths of the strings, a fundamental question to ask is: how much redundancy do we need to introduce in $\mathbf{x}$ in order to recover the boundaries of $\mathbf{x}^1,\mathbf{x}^2,\ldots$? This boundary problem is equivalent to the problem of designing codes that can detect the exact number of deletions in each concatenated string. In this work, we answer the question above by first deriving converse results that give lower bounds on the redundancy of deletion-detecting codes. Then, we present a marker-based code construction whose redundancy is asymptotically optimal in $\delta$ among all families of deletion-detecting codes, and exactly optimal among all block-by-block decodable codes. To exemplify the usefulness of such deletion-detecting codes, we apply our code to trace reconstruction and design an efficient coded reconstruction scheme that requires a constant number of traces.
翻译:考虑两个或多个字符串 $\mathbf{x}^1,\mathbf{x}^2,\ldots,$ 连接形成 $\mathbf{x}=\langle \mathbf{x}^1,\mathbf{x}^2,\ldots \rangle$。假设每个连接字符串中最多发生 $\delta$ 次删除。由于删除会改变字符串长度,一个基本问题是:我们需要在 $\mathbf{x}$ 中引入多少冗余才能恢复 $\mathbf{x}^1,\mathbf{x}^2,\ldots$ 的边界?该边界问题等价于设计能检测每个连接字符串中精确删除次数的编码问题。本文通过首先推导逆结论给出检测删除编码冗余度的下界来回答上述问题。随后,我们提出一种基于标记的编码构造方法,该方法的冗余度在所有检测删除编码族中关于 $\delta$ 是渐近最优的,并且在所有逐块可译码中达到精确最优。为展示此类检测删除编码的实用性,我们将所提编码应用于轨迹重建,并设计了一种需要恒定数量轨迹的高效编码重建方案。