Set reconciliation, where two parties hold fixed-length bit strings and run a protocol to learn the strings they are missing from each other, is a fundamental task in many distributed systems. We present Rateless Invertible Bloom Lookup Tables (Rateless IBLT), the first set reconciliation protocol, to the best of our knowledge, that achieves low computation cost and near-optimal communication cost across a wide range of scenarios: set differences of one to millions, bit strings of a few bytes to megabytes, and workloads injected by potential adversaries. Rateless IBLT is based on a novel encoder that incrementally encodes the set difference into an infinite stream of coded symbols, resembling rateless error-correcting codes. We compare Rateless IBLT with state-of-the-art set reconciliation schemes and demonstrate significant improvements. Rateless IBLT achieves 3--4x lower communication cost than non-rateless schemes with similar computation cost, and 2--2000x lower computation cost than schemes with similar communication cost. We show the real-world benefits of Rateless IBLT by applying it to synchronize the state of the Ethereum blockchain, and demonstrate 5.6x lower end-to-end completion time and 4.4x lower communication cost compared to the system used in production.
翻译:集合协调是一种基础性分布式系统任务,涉及两方分别持有固定长度比特串,通过运行协议学习彼此缺失的比特串。本文提出无速率可逆布鲁姆查找表(Rateless Invertible Bloom Lookup Tables, Rateless IBLT),据我们所知,这是首个在广泛场景中实现低计算开销与近最优通信开销的集合协调协议:覆盖从个位数到百万级的集合差异、从数字节到兆字节的比特串,以及潜在攻击者注入的工作负载。Rateless IBLT基于新型编码器,该编码器将集合差异增量编码为无限长的编码符号流,类似于无速率纠错码。我们将Rateless IBLT与现有最先进的集合协调方案进行对比,展示其显著优势:在计算开销相近时,通信开销降低3-4倍;在通信开销相近时,计算开销降低2-2000倍。通过将以太坊区块链状态同步作为应用实例,我们验证了Rateless IBLT的实际效益:端到端完成时间较生产系统降低5.6倍,通信开销降低4.4倍。