A dictionary data structure maintains a set of at most $n$ keys from the universe $[U]$ under key insertions and deletions, such that given a query $x \in [U]$, it returns if $x$ is in the set. Some variants also store values associated to the keys such that given a query $x$, the value associated to $x$ is returned when $x$ is in the set. This fundamental data structure problem has been studied for six decades since the introduction of hash tables in 1953. A hash table occupies $O(n\log U)$ bits of space with constant time per operation in expectation. There has been a vast literature on improving its time and space usage. The state-of-the-art dictionary by Bender, Farach-Colton, Kuszmaul, Kuszmaul and Liu [BFCK+22] has space consumption close to the information-theoretic optimum, using a total of \[ \log\binom{U}{n}+O(n\log^{(k)} n) \] bits, while supporting all operations in $O(k)$ time, for any parameter $k \leq \log^* n$. The term $O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n)$ is referred to as the wasted bits per key. In this paper, we prove a matching cell-probe lower bound: For $U=n^{1+\Theta(1)}$, any dictionary with $O(\log^{(k)} n)$ wasted bits per key must have expected operational time $\Omega(k)$, in the cell-probe model with word-size $w=\Theta(\log U)$. Furthermore, if a dictionary stores values of $\Theta(\log U)$ bits, we show that regardless of the query time, it must have $\Omega(k)$ expected update time. It is worth noting that this is the first cell-probe lower bound on the trade-off between space and update time for general data structures.
翻译:字典数据结构维护一个至多包含 $n$ 个来自全集 $[U]$ 的键的集合,支持键的插入和删除操作,并可在查询 $x \in [U]$ 时返回 $x$ 是否存在于集合中。某些变体还存储与键关联的值,使得当查询 $x$ 存在于集合中时,返回与之关联的值。自 1953 年哈希表问世以来,这一基础数据结构问题已历经六十年研究。哈希表占用 $O(n\log U)$ 比特空间,且每项操作期望时间为常数。关于改进其时空性能的文献浩如烟海。Bender、Farach-Colton、Kuszmaul、Kuszmaul 和 Liu [BFCK+22] 提出的最先进字典,其空间消耗接近信息论最优,总空间为 \[ \log\binom{U}{n}+O(n\log^{(k)} n) \] 比特,同时支持所有操作在 $O(k)$ 时间内完成,其中参数 $k \leq \log^* n$。术语 $O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n)$ 称为每键浪费比特。本文证明了一个匹配的细胞探针下界:对于 $U=n^{1+\Theta(1)}$,若字典每键浪费 $O(\log^{(k)} n)$ 比特,则在字长 $w=\Theta(\log U)$ 的细胞探针模型中,其期望操作时间必为 $\Omega(k)$。此外,若字典存储 $\Theta(\log U)$ 比特的值,我们证明无论查询时间如何,其期望更新时间必为 $\Omega(k)$。值得注意的是,这是首个针对通用数据结构空间与更新时间权衡的细胞探针下界。