We study the document exchange problem under multiple substring edits. A substring edit in a string $\mathbf{x}$ occurs when a substring $\mathbf{u}$ of $\mathbf{x}$ is replaced by an arbitrary string $\mathbf{v}$. The lengths of $\mathbf{u}$ and $\mathbf{v}$ are bounded from above by a fixed constant. Let $\mathbf{x}$ and $\mathbf{y}$ be two binary strings that differ by multiple substring edits. The aim of document exchange schemes is to construct an encoding of $\mathbf{x}$ with small length such that $\mathbf{x}$ can be recovered using $\mathbf{y}$ and the encoding. We construct a low-complexity document exchange scheme with encoding length of $4t\log n+o(\log n)$ bits, where $n$ is the length of the string $\mathbf{x}$. The best known scheme achieves an encoding length of $4t \log n+O(\log\log n)$ bits, but at a much higher computational complexity. Then, we investigate the average length of valid encodings for document exchange schemes with uniform strings $\mathbf{x}$ and develop a scheme with an expected encoding length of $(4t-1) \log n+o(\log n)$ bits. In this setting, prior works have only constructed schemes for a single substring edit.
翻译:我们研究了多子串编辑下的文档交换问题。字符串 $\mathbf{x}$ 中的子串编辑是指将 $\mathbf{x}$ 的一个子串 $\mathbf{u}$ 替换为任意字符串 $\mathbf{v}$,其中 $\mathbf{u}$ 和 $\mathbf{v}$ 的长度上界由固定常数限定。设 $\mathbf{x}$ 和 $\mathbf{y}$ 为两个因多子串编辑而不同的二进制字符串。文档交换方案的目标是构造一个长度较小的 $\mathbf{x}$ 编码,使得能够利用 $\mathbf{y}$ 和该编码恢复 $\mathbf{x}$。我们构建了一种低复杂度的文档交换方案,其编码长度为 $4t\log n+o(\log n)$ 比特,其中 $n$ 为字符串 $\mathbf{x}$ 的长度。目前已知的最佳方案虽然能达到 $4t \log n+O(\log\log n)$ 比特的编码长度,但其计算复杂度显著更高。随后,我们针对均匀分布字符串 $\mathbf{x}$ 的文档交换方案,研究了有效编码的平均长度,并开发了一种期望编码长度为 $(4t-1) \log n+o(\log n)$ 比特的方案。在此设定下,先前工作仅构建了针对单子串编辑的方案。