Despite being one of the oldest data structures in computer science, hash tables continue to be the focus of a great deal of both theoretical and empirical research. A central reason for this is that many of the fundamental properties that one desires from a hash table are difficult to achieve simultaneously; thus many variants offering different trade-offs have been proposed. This paper introduces Iceberg hashing, a hash table that simultaneously offers the strongest known guarantees on a large number of core properties. Iceberg hashing supports constant-time operations while improving on the state of the art for space efficiency, cache efficiency, and low failure probability. Iceberg hashing is also the first hash table to support a load factor of up to $1 - o(1)$ while being stable, meaning that the position where an element is stored only ever changes when resizes occur. In fact, in the setting where keys are $\Theta(\log n)$ bits, the space guarantees that Iceberg hashing offers, namely that it uses at most $\log \binom{|U|}{n} + O(n \log \log n)$ bits to store $n$ items from a universe $U$, matches a lower bound by Demaine et al. that applies to any stable hash table. Iceberg hashing introduces new general-purpose techniques for some of the most basic aspects of hash-table design. Notably, our indirection-free technique for dynamic resizing, which we call waterfall addressing, and our techniques for achieving stability and very-high probability guarantees, can be applied to any hash table that makes use of the front-yard/backyard paradigm for hash table design.
翻译:尽管哈希表是计算机科学中最古老的数据结构之一,但它仍然是大量理论和实证研究的焦点。其中一个核心原因是,人们期望从哈希表中获得的许多基本属性很难同时实现;因此,人们提出了许多提供不同权衡的变体。本文介绍了冰山哈希(Iceberg hashing),这是一种哈希表,它同时提供了关于大量核心属性的已知最强保证。冰山哈希支持常数时间操作,同时在空间效率、缓存效率和低失败概率方面改进了现有技术。冰山哈希也是第一个支持负载因子高达$1 - o(1)$的同时保持稳定性的哈希表,这意味着元素存储的位置仅在发生调整大小时才会发生变化。实际上,在键为$\Theta(\log n)$比特的设置下,冰山哈希提供的空间保证(即使用最多$\log \binom{|U|}{n} + O(n \log \log n)$比特来存储来自全集$U$的$n$个项目)达到了Demaine等人对任何稳定哈希表适用的下界。冰山哈希为哈希表设计的一些最基本方面引入了新的通用技术。值得注意的是,我们用于动态调整大小的无间接寻址技术(称为瀑布寻址),以及我们实现稳定性和极高概率保证的技术,可以应用于任何采用前院/后院范式进行哈希表设计的哈希表。