In the problem of perfect hashing, we are given a size $k$ subset $\mathcal{A}$ of a universe of keys $[n] = \{1,2, \cdots, n\}$, for which we wish to construct a hash function $h: [n] \to [b]$ such that $h(\cdot)$ maps $\mathcal{A}$ to $[b]$ with no collisions, i.e., the restriction of $h(\cdot)$ to $\mathcal{A}$ is injective. When $b=k$, the problem is referred to as minimal perfect hashing. In this paper, we extend the study of minimal perfect hashing to the approximate setting. For some $α\in [0, 1]$, we say that a randomized hashing scheme is $α$-perfect if for any input $\mathcal{A}$ of size $k$, it outputs a hash function which exhibits at most $(1-α)k$ collisions on $\mathcal{A}$ in expectation. One important performance consideration for any hashing scheme is the space required to store the hash functions. For minimal perfect hashing, i.e., $b = k$, it is well known that approximately $k\log(e)$ bits, or $\log(e)$ bits per key, is required to store the hash function. In this paper, we propose schemes for constructing minimal $α$-perfect hash functions and analyze their space requirements. We begin by presenting a simple base-line scheme which randomizes between perfect hashing and zero-bit random hashing. We then present a more sophisticated hashing scheme based on sampling which significantly improves upon the space requirement of the aforementioned strategy for all values of $α$.
翻译:在完美哈希问题中,给定全域键集合$[n] = \{1,2, \cdots, n\}$中一个大小为$k$的子集$\mathcal{A}$,我们需要构造一个哈希函数$h: [n] \to [b]$,使得$h(\cdot)$将$\mathcal{A}$映射到$[b]$时无冲突,即$h(\cdot)$在$\mathcal{A}$上的限制是单射。当$b=k$时,该问题称为最小完美哈希。本文将对最小完美哈希的研究扩展到近似场景。对于某个$α\in [0, 1]$,若对于任意大小为$k$的输入$\mathcal{A}$,随机化哈希方案能输出一个哈希函数,其在$\mathcal{A}$上的期望冲突数不超过$(1-α)k$,则称该方案为$α$-完美哈希。任何哈希方案的一个重要性能考量是存储哈希函数所需的空间。对于最小完美哈希(即$b = k$),已知存储哈希函数约需$k\log(e)$比特,即每个键需$\log(e)$比特。本文提出了构建最小$α$-完美哈希函数的方案,并分析了其空间需求。我们首先提出一种简单的基线方案,该方案在完美哈希与零比特随机哈希之间进行随机化选择。随后提出一种基于采样的更复杂哈希方案,该方案在所有$α$取值下均显著改善了前述策略的空间需求。