Traditional GPU hash tables preserve every inserted key -- a dictionary assumption that wastes scarce High Bandwidth Memory (HBM) when embedding tables routinely exceed single-GPU capacity. We challenge this assumption with cache semantics, where policy-driven eviction is a first-class operation. We introduce HierarchicalKV (HKV), the first general-purpose GPU hash table library whose normal full-capacity operating contract is cache-semantic: each full-bucket upsert (update-or-insert) is resolved in place by eviction or admission rejection rather than by rehashing or capacity-induced failure. HKV co-designs four core mechanisms -- cache-line-aligned buckets, in-line score-driven upsert, score-based dynamic dual-bucket selection, and triple-group concurrency -- and uses tiered key-value separation as a scaling enabler beyond HBM. On an NVIDIA H100 NVL GPU, HKV achieves up to 3.9 billion key-value pairs per second (B-KV/s) find throughput, stable across load factors 0.50-1.00 (<5% variation), and delivers 1.4x higher find throughput than WarpCore (the strongest dictionary-semantic GPU baseline at lambda=0.50) and up to 2.6-9.4x over indirection-based GPU baselines. Since its open-source release in October 2022, HKV has been integrated into multiple open-source recommendation frameworks.
翻译:传统的GPU哈希表会保留所有已插入的键——这种字典假设在嵌入表经常超出单GPU容量时,会浪费稀缺的高带宽内存(HBM)。我们通过引入缓存语义来挑战这一假设,其中策略驱动的逐出是一等操作。我们提出了HierarchicalKV(HKV),这是第一个通用GPU哈希表库,其常规满容量操作契约是缓存语义的:每个满桶的更新或插入(upsert)操作都通过就地逐出或准入拒绝来解决,而不是通过重哈希或容量不足导致的失败。HKV协同设计了四个核心机制——缓存行对齐的桶、内联分数驱动的更新或插入、基于分数的动态双桶选择以及三重组并发——并使用分层键值分离作为超越HBM的扩展使能器。在NVIDIA H100 NVL GPU上,HKV实现了高达每秒39亿键值对(B-KV/s)的查找吞吐量,在负载因子0.50-1.00范围内保持稳定(变化<5%),查找吞吐量比WarpCore(在λ=0.50时最强的字典语义GPU基线)高1.4倍,比基于间接寻址的GPU基线高2.6-9.4倍。自2022年10月开源发布以来,HKV已被集成到多个开源推荐框架中。