Perfect hash functions give unique "names" to arbitrary keys requiring only a few bits per key. This is an essential building block in applications like static hash tables, databases, or bioinformatics. This paper introduces the PHast approach that combines the fastest available queries, very fast construction, and good space consumption (below 2 bits per key). PHast improves bucket-placement which first hashes each key k to a bucket, and then looks for the bucket seed s such that a placement function maps pairs (s,k) in a collision-free way. PHast can use small-range hash functions with linear mapping, fixed-width encoding of seeds, and parallel construction. This is achieved using small overlapping slices of allowed values and bumping to handle unsuccessful seed assignment. A variant we called PHast+ uses additive placement, which enables bit-parallel seed searching, speeding up the construction by an order of magnitude.
翻译:完美哈希函数能够为任意键值赋予唯一的“名称”,每个键值仅需占用数比特。这是静态哈希表、数据库或生物信息学等应用中的关键基础构件。本文提出的PHast方法融合了当前最快的查询速度、极快的构建效率以及良好的空间占用(每个键值低于2比特)。PHast改进了桶分配策略:该策略首先将每个键值k哈希至对应的桶,随后寻找桶种子s,使得分配函数能够以无冲突的方式映射(s,k)对。PHast可采用具有线性映射的小范围哈希函数、固定宽度的种子编码以及并行构建机制。其实现依赖于允许值的小范围重叠切片设计,并通过碰撞处理机制应对种子分配失败的情况。我们提出的变体PHast+采用加法分配策略,实现了比特级并行种子搜索,将构建速度提升了一个数量级。