In the problem of minimal perfect hashing, we are given a size $k$ subset $\mathcal{A}$ of a universe of keys $[n] = \{1,2, \cdots, n\}$, for which we wish to construct a hash function $h: [n] \to [k]$ such that $h(\cdot)$ maps $\mathcal{A}$ to $[k]$ with no collisions, i.e., the restriction of $h(\cdot)$ to $\mathcal{A}$ is injective. In this paper, we extend the study of minimal perfect hashing to the approximate setting. For an $α\in [0, 1]$, we say that a randomized hashing scheme is $α$-perfect if for any input $\mathcal{A}$ of size $k$, it outputs a hash function which exhibits at most $(1-α)k$ collisions on $\mathcal{A}$ in expectation. One important performance consideration for any hashing scheme is the space required to store the hash functions. For minimal perfect hashing, it is well known that approximately $k\log(e)$ bits, or $\log(e)$ bits per key, is required to store the hash function. In this paper, we propose schemes for constructing minimal $α$-perfect hash functions and analyze their space requirements. We begin by presenting a simple base-line scheme which randomizes between perfect hashing and zero-bit random hashing. We then present a more sophisticated hashing scheme based on sampling which significantly improves upon the space requirement of the aforementioned strategy for all values of $α$.
翻译:在最小完美哈希问题中,给定大小为 $k$ 的键宇宙子集 $\mathcal{A}\subseteq[n] = \{1,2,\cdots, n\}$,我们希望构造哈希函数 $h: [n] \to [k]$,使得 $h(\cdot)$ 将 $\mathcal{A}$ 映射到 $[k]$ 且无冲突,即 $h(\cdot)$ 在 $\mathcal{A}$ 上的限制是单射。本文将最小完美哈希的研究拓展至近似场景。对于 $\alpha\in[0,1]$,若随机化哈希方案对任意大小为 $k$ 的输入 $\mathcal{A}$,输出哈希函数在 $\mathcal{A}$ 上的期望冲突数至多为 $(1-\alpha)k$,则称其为 $\alpha$-完美哈希。哈希方案的关键性能指标之一是存储哈希函数所需的空间。众所周知,最小完美哈希约需 $k\log(e)$ 比特(即每键 $\log(e)$ 比特)存储哈希函数。本文提出构造最小 $\alpha$-完美哈希函数的方案并分析其空间需求。首先给出一个简单基线方案,其在完美哈希与零比特随机哈希之间进行随机化;随后提出基于采样的更复杂哈希方案,该方案对所有 $\alpha$ 值均显著降低了前述策略的空间需求。