A dictionary data structure maintains a set of at most $n$ keys from the universe $[U]$ under key insertions and deletions, such that given a query $x \in [U]$, it returns if $x$ is in the set. Some variants also store values associated to the keys such that given a query $x$, the value associated to $x$ is returned when $x$ is in the set. This fundamental data structure problem has been studied for six decades since the introduction of hash tables in 1953. A hash table occupies $O(n\log U)$ bits of space with constant time per operation in expectation. There has been a vast literature on improving its time and space usage. The state-of-the-art dictionary by Bender, Farach-Colton, Kuszmaul, Kuszmaul and Liu [BFCK+22] has space consumption close to the information-theoretic optimum, using a total of \[ \log\binom{U}{n}+O(n\log^{(k)} n) \] bits, while supporting all operations in $O(k)$ time, for any parameter $k \leq \log^* n$. The term $O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n)$ is referred to as the wasted bits per key. In this paper, we prove a matching cell-probe lower bound: For $U=n^{1+\Theta(1)}$, any dictionary with $O(\log^{(k)} n)$ wasted bits per key must have expected operational time $\Omega(k)$, in the cell-probe model with word-size $w=\Theta(\log U)$. Furthermore, if a dictionary stores values of $\Theta(\log U)$ bits, we show that regardless of the query time, it must have $\Omega(k)$ expected update time. It is worth noting that this is the first cell-probe lower bound on the trade-off between space and update time for general data structures.
翻译:字典数据结构维护一个最多包含 $n$ 个来自全集 $[U]$ 中键的集合,支持键的插入和删除操作,并能在查询 $x \in [U]$ 时返回 $x$ 是否存在于集合中。某些变体还会存储与键关联的值,当查询 $x$ 在集合中时返回该值。自1953年哈希表提出以来,这一基础数据结构问题已历经六十年研究。哈希表占用 $O(n\log U)$ 比特空间,每次操作期望时间为常数。关于改进其时间与空间开销的研究成果浩如烟海。Bender、Farach-Colton、Kuszmaul、Kuszmaul 和 Liu 提出的最先进字典 [BFCK+22] 将空间消耗逼近信息论下限,总空间为 \[ \log\binom{U}{n}+O(n\log^{(k)} n) \] 比特,同时支持所有操作在 $O(k)$ 时间内完成,其中参数 $k \leq \log^* n$。项 $O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n)$ 被称为每键浪费比特。本文证明了一个匹配的细胞探测下界:当 $U=n^{1+\Theta(1)}$ 时,任何每键浪费比特数为 $O(\log^{(k)} n)$ 的字典,在字长 $w=\Theta(\log U)$ 的细胞探测模型中,其期望操作时间必然为 $\Omega(k)$。进一步地,若字典存储 $\Theta(\log U)$ 比特的值,我们证明无论查询时间如何,其期望更新时间必然为 $\Omega(k)$。值得注意的是,这是首个针对通用数据结构在空间与更新时间权衡上的细胞探测下界。