Data synchronization is a fundamental problem with applications in diverse fields such as cloud storage, genomics, and distributed systems. This paper addresses the challenge of synchronizing two files, one of which is a subsequence of the other and related through a constant rate of deletions, using an improved communication protocol. Building upon prior work, we integrate advanced multi-deletion correction codes into an existing baseline protocol, which previously relied on single-deletion correction. Our proposed protocol reduces communication cost by leveraging more general partitioning techniques as well as multi-deletion error correction. We derive a generalized upper bound on the expected number of transmitted bits, applicable to a broad class of deletion correction codes. Experimental results demonstrate that our approach outperforms the baseline in communication cost. These findings establish the efficacy of the improved protocol in achieving low-redundancy synchronization in scenarios where deletion errors occur.
翻译:数据同步是云计算存储、基因组学与分布式系统等多个领域的基础性问题。本文针对两个文件之间的同步挑战展开研究,其中一个文件是另一个文件的子序列,且两者间存在恒定速率的删除操作关联。通过改进通信协议,我们在现有基线协议(原依赖单删除校正编码)的基础上,集成了先进的多删除校正编码技术。所提出的协议通过采用更通用的分区技术及多删除纠错机制,有效降低了通信开销。我们推导了适用于广泛类别删除校正编码的期望传输比特数广义上界。实验结果表明,该方法在通信成本上优于基线协议。这些发现证实了改进协议在删除错误场景下实现低冗余同步的有效性。