In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities. IBLTs are highly versatile data structures that have found applications in set reconciliation protocols, error-correcting codes, and even the design of advanced cryptographic primitives. For storing $n$ elements and ensuring correctness with probability at least $1 - \delta$, existing IBLT constructions require $\Omega(n(\frac{\log(1/\delta)}{\log(n)}+1))$ space and they crucially rely on fully random hash functions. We present new constructions of IBLTs that are simultaneously more space efficient and require less randomness. For storing $n$ elements with a failure probability of at most $\delta$, our data structure only requires $\mathcal{O}(n + \log(1/\delta)\log\log(1/\delta))$ space and $\mathcal{O}(\log(\log(n)/\delta))$-wise independent hash functions. As a key technical ingredient we show that hashing $n$ keys with any $k$-wise independent hash function $h:U \to [Cn]$ for some sufficiently large constant $C$ guarantees with probability $1 - 2^{-\Omega(k)}$ that at least $n/2$ keys will have a unique hash value. Proving this is highly non-trivial as $k$ approaches $n$. We believe that the techniques used to prove this statement may be of independent interest.
翻译:本文研究具有低失败概率的可逆布鲁姆查找表(IBLTs)。IBLTs是高度通用的数据结构,已应用于集合协商协议、纠错码以及高级密码学原语的设计。对于存储$n$个元素并确保正确性概率至少为$1 - \delta$,现有IBLT构造需要$\Omega(n(\frac{\log(1/\delta)}{\log(n)}+1))$空间,且严重依赖完全随机哈希函数。我们提出新的IBLT构造,同时实现更高空间效率和更低随机性需求。对于存储$n$个元素且失败概率不超过$\delta$,我们的数据结构仅需$\mathcal{O}(n + \log(1/\delta)\log\log(1/\delta))$空间和$\mathcal{O}(\log(\log(n)/\delta))$次独立哈希函数。作为关键技术要素,我们证明:使用任意$k$次独立哈希函数$h:U \to [Cn]$(其中$C$为足够大常数)对$n$个键进行哈希,能以至少$1 - 2^{-\Omega(k)}$的概率保证至少$n/2$个键具有唯一哈希值。当$k$接近$n$时,证明该结论高度非平凡。我们相信用于证明该断言的技术可能具有独立研究价值。