The distribution of keys to a given number of buckets is a fundamental task in distributed data processing and storage. A simple, fast, and therefore popular approach is to map the hash values of keys to buckets based on the remainder after dividing by the number of buckets. Unfortunately, these mappings are not stable when the number of buckets changes, which can lead to severe spikes in system resource utilization, such as network or database requests. Consistent hash algorithms can minimize remappings, but are either significantly slower than the modulo-based approach, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This paper introduces JumpBackHash, which uses only integer arithmetic and a standard pseudorandom generator. Due to its speed and simple implementation, it can safely replace the modulo-based approach to improve assignment and system stability. A production-ready Java implementation of JumpBackHash has been released as part of the Hash4j open source library.
翻译:将键值分配到给定数量的桶中是分布式数据处理与存储的一项基本任务。一种简单、快速且因而广泛使用的方法是将键的哈希值基于除以桶数后的余数映射到对应的桶。然而,当桶的数量发生变化时,这种映射关系并不稳定,可能导致系统资源使用率(如网络或数据库请求)出现剧烈峰值。一致性哈希算法可以最小化重映射,但其速度明显慢于基于取模的方法,需要浮点运算,或者依赖于标准库中很少提供的哈希函数族。本文提出了 JumpBackHash,该算法仅使用整数运算和标准伪随机数生成器。由于其速度快且实现简单,它可以安全地替代基于取模的方法,从而提升分配效果和系统稳定性。JumpBackHash 的一个生产就绪的 Java 实现已作为 Hash4j 开源库的一部分发布。