In this work we study Invertible Bloom Lookup Tables (IBLTs) with small failure probabilities. IBLTs are highly versatile data structures that have found applications in set reconciliation protocols, error-correcting codes, and even the design of advanced cryptographic primitives. For storing $n$ elements and ensuring correctness with probability at least $1 - \delta$, existing IBLT constructions require $\Omega(n(\frac{\log(1/\delta)}{\log(n)}+1))$ space and they crucially rely on fully random hash functions. We present new constructions of IBLTs that are simultaneously more space efficient and require less randomness. For storing $n$ elements with a failure probability of at most $\delta$, our data structure only requires $\mathcal{O}(n + \log(1/\delta)\log\log(1/\delta))$ space and $\mathcal{O}(\log(\log(n)/\delta))$-wise independent hash functions. As a key technical ingredient we show that hashing $n$ keys with any $k$-wise independent hash function $h:U \to [Cn]$ for some sufficiently large constant $C$ guarantees with probability $1 - 2^{-\Omega(k)}$ that at least $n/2$ keys will have a unique hash value. Proving this is highly non-trivial as $k$ approaches $n$. We believe that the techniques used to prove this statement may be of independent interest.
翻译:本文研究了具有低失败概率的可逆型布鲁姆查找表(IBLTs)。IBLTs是一种高度通用的数据结构,已被应用于集合协调协议、纠错码甚至高级密码原语的设计中。为存储$n$个元素并确保正确性概率至少为$1-\delta$,现有的IBLT构造需要$\Omega(n(\frac{\log(1/\delta)}{\log(n)}+1))$的空间,且关键依赖于全随机哈希函数。我们提出了新的IBLT构造,在提升空间效率的同时降低了对随机性的需求。对于存储$n$个元素且失败概率不超过$\delta$的情况,我们的数据结构仅需$\mathcal{O}(n + \log(1/\delta)\log\log(1/\delta))$的空间和$\mathcal{O}(\log(\log(n)/\delta))$阶独立哈希函数。作为关键技术贡献,我们证明:对任意$k$阶独立哈希函数$h:U \to [Cn]$($C$为足够大的常数),当哈希$n$个键时,至少$n/2$个键将具有唯一哈希值的概率至少为$1 - 2^{-\Omega(k)}$。由于$k$接近$n$,这一证明极具挑战性。我们相信该证明所使用的技术可能具有独立的研究价值。