GPU hash tables are increasingly used to accelerate data processing, but their limited functionality restricts adoption in large-scale data processing applications. Current limitations include incomplete concurrency support and missing compound operations such as upserts. This paper presents WarpSpeed, a library of high-performance concurrent GPU hash tables with a unified benchmarking framework for performance analysis. WarpSpeed implements eight state-of-the-art Nvidia GPU hash table designs and provides a rich API designed for modern GPU applications. Our evaluation uses diverse benchmarks to assess both correctness and scalability, and we demonstrate real-world impact by integrating these hash tables into three downstream applications. We propose several optimization techniques to reduce concurrency overhead, including fingerprint-based metadata to minimize cache line probes and specialized Nvidia GPU instructions for lock-free queries. Our findings provide new insights into concurrent GPU hash table design and offer practical guidance for developing efficient, scalable data structures on modern GPUs.
翻译:GPU哈希表正日益广泛地用于加速数据处理,但其有限的功能限制了其在大规模数据处理应用中的采用。当前的局限性包括不完整的并发支持以及缺少复合操作(如upsert)。本文提出了WarpSpeed——一个高性能并发GPU哈希表库,并附带用于性能分析的统一基准测试框架。WarpSpeed实现了八种先进的Nvidia GPU哈希表设计,并提供了专为现代GPU应用设计的丰富API。我们的评估采用多样化基准测试来检验正确性与可扩展性,并通过将这些哈希表集成到三个下游应用中展示了实际影响。我们提出了多种优化技术以降低并发开销,包括基于指纹的元数据以减少缓存行探测,以及利用专用Nvidia GPU指令实现无锁查询。我们的研究结果为并发GPU哈希表设计提供了新见解,并为在现代GPU上开发高效、可扩展的数据结构提供了实用指导。