We consider the problem of reconstructing the symmetric difference between similar sets from their representations (sketches) of size linear in the number of differences. Exact solutions to this problem are based on error-correcting coding techniques and suffer from a large decoding time. Existing probabilistic solutions based on Invertible Bloom Lookup Tables (IBLTs) are time-efficient but offer insufficient success guarantees for many applications. Here we propose a tunable trade-off between the two approaches combining the efficiency of IBLTs with exponentially decreasing failure probability. The proof relies on a refined analysis of IBLTs proposed in (Baek Tejs Houen et al. SOSA 2023) which has an independent interest. We also propose a modification of our algorithm that enables telling apart the elements of each set in the symmetric difference.
翻译:我们研究了在表示(草图)大小与差异数量呈线性关系的情况下,从相似集合的表示中重构对称差的问题。该问题的精确解法基于纠错编码技术,但解码时间开销较大。现有基于可逆布隆查找表(IBLTs)的概率解法虽高效,但在许多应用中难以提供充分的成功保证。本文提出一种可在两种方法之间进行可调权衡的方案,该方案融合了IBLTs的高效性,同时将失败概率降至指数级递减。证明过程依赖于对(Baek Tejs Houen 等人,SOSA 2023)中提出的IBLTs的精细化分析,这一分析本身具有独立研究价值。此外,我们对该算法进行了改进,使其能够区分对称差中各集合的元素。