Tight Cell-Probe Lower Bounds for Dynamic Succinct Dictionaries

A dictionary data structure maintains a set of at most $n$ keys from the universe $[U]$ under key insertions and deletions, such that given a query $x \in [U]$, it returns if $x$ is in the set. Some variants also store values associated to the keys such that given a query $x$, the value associated to $x$ is returned when $x$ is in the set. This fundamental data structure problem has been studied for six decades since the introduction of hash tables in 1953. A hash table occupies $O(n\log U)$ bits of space with constant time per operation in expectation. There has been a vast literature on improving its time and space usage. The state-of-the-art dictionary by Bender, Farach-Colton, Kuszmaul, Kuszmaul and Liu [BFCK+22] has space consumption close to the information-theoretic optimum, using a total of \[ \log\binom{U}{n}+O(n\log^{(k)} n) \] bits, while supporting all operations in $O(k)$ time, for any parameter $k \leq \log^* n$. The term $O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n)$ is referred to as the wasted bits per key. In this paper, we prove a matching cell-probe lower bound: For $U=n^{1+\Theta(1)}$, any dictionary with $O(\log^{(k)} n)$ wasted bits per key must have expected operational time $\Omega(k)$, in the cell-probe model with word-size $w=\Theta(\log U)$. Furthermore, if a dictionary stores values of $\Theta(\log U)$ bits, we show that regardless of the query time, it must have $\Omega(k)$ expected update time. It is worth noting that this is the first cell-probe lower bound on the trade-off between space and update time for general data structures.

翻译：字典数据结构维护一个来自全集 $[U]$ 、大小至多为 $n$ 的键集合，支持键的插入与删除操作，且能对任意查询 $x \in [U]$ 返回 $x$ 是否存在于集合中。部分变体还存储与键关联的值，当 $x$ 在集合中时返回其对应值。自1953年哈希表问世以来，这一基础数据结构问题已被研究六十年。哈希表占用 $O(n\log U)$ 比特空间，且单次操作期望时间为常数。大量文献致力于优化其时空性能。Bender、Farach-Colton、Kuszmaul、Kuszmaul 和 Liu 提出的当前最优字典 [BFCK+22]，其空间消耗接近信息论下界，总计使用 \[\log\binom{U}{n}+O(n\log^{(k)} n)\] 比特，并能在 $O(k)$ 时间内支持所有操作（其中参数 $k \leq \log^* n$）。术语 $O(\log^{(k)} n) = O(\underbrace{\log\cdots\log}_k n)$ 称为每键浪费比特。本文证明了一个匹配的细胞探测下界：对于 $U=n^{1+\Theta(1)}$，在字长 $w=\Theta(\log U)$ 的细胞探测模型中，任何每键浪费比特为 $O(\log^{(k)} n)$ 的字典，其期望操作时间必须为 $\Omega(k)$。进一步地，若字典存储 $\Theta(\log U)$ 比特的值，我们证明无论查询时间如何，其期望更新时间必须为 $\Omega(k)$。值得注意的是，这是首个针对一般数据结构空间与更新时间权衡的细胞探测下界。