In the problem of perfect hashing, we are given a size $k$ subset $\mathcal{A}$ of a universe of keys $[n] = \{1,2, \cdots, n\}$, for which we wish to construct a hash function $h: [n] \to [b]$ such that $h(\cdot)$ maps $\mathcal{A}$ to $[b]$ with no collisions, i.e., the restriction of $h(\cdot)$ to $\mathcal{A}$ is injective. When $b=k$, the problem is referred to as minimal perfect hashing. In this paper, we extend the study of minimal perfect hashing to the approximate setting. For some $α\in [0, 1]$, we say that a randomized hashing scheme is $α$-perfect if for any input $\mathcal{A}$ of size $k$, it outputs a hash function which exhibits at most $(1-α)k$ collisions on $\mathcal{A}$ in expectation. One important performance consideration for any hashing scheme is the space required to store the hash functions. For minimal perfect hashing, i.e., $b = k$, it is well known that approximately $k\log(e)$ bits, or $\log(e)$ bits per key, is required to store the hash function. In this paper, we propose schemes for constructing minimal $α$-perfect hash functions and analyze their space requirements. We begin by presenting a simple base-line scheme which randomizes between perfect hashing and zero-bit random hashing. We then present a more sophisticated hashing scheme based on sampling which significantly improves upon the space requirement of the aforementioned strategy for all values of $α$.
翻译:暂无翻译