FliX: Flipped-Indexing for Scalable GPU Queries and Updates

GPU-based concurrent data structures (CDSs) achieve high throughput for read-only queries, but efficient support for dynamic updates on fully GPU-resident data remains challenging. Ordered CDSs (e.g., B-trees and LSM-trees) maintain an index layer that directs operations to a data layer (buckets or leaves), while hash tables avoid the cost of maintaining order but do not support range or successor queries. On GPUs, maintaining and traversing an index layer under frequent updates introduces contention and warp divergence. To tackle these problems, we flip the indexing paradigm on its head with FliX, a comparison-based, flipped indexing strategy for dynamic, fully GPU-resident CDSs. Traditional GPU CDSs typically take a batch of operations and assign each operation to a GPU thread or warp. FliX, however, assigns compute (e.g., a warp) to each bucket in the data layer, and each bucket then locates operations it is responsible for in the batch. FliX can replace many index layer traversals with a single binary search on the batch, reducing redundant work and warp divergence. Further, FliX simplifies updates as no index layer must be maintained. In our experiments, FliX achieves 6.5x reduced query latency compared to a leading GPU B-tree and 1.5x compared to a leading GPU LSM-tree, while delivering 4x higher throughput per memory footprint than ordered competitors. Despite maintaining order, FliX also surpasses state-of-the-art unordered GPU hash tables in query and deletion performance, and is highly competitive in insertion performance. In update-heavy workloads, it outperforms the closest fully dynamic ordered baseline by over 8x in insertion throughput while supporting dynamic memory reclamation. These results suggest that eliminating the index layer and adopting a compute-to-bucket mapping can enable practical, fully dynamic GPU indexing without sacrificing query performance.

翻译：基于GPU的并发数据结构在只读查询中可实现高吞吐，但如何高效支持完全驻留GPU数据的动态更新仍具挑战性。有序并发数据结构（如B树和LSM树）通过维护索引层将操作引导至数据层（桶或叶节点），而哈希表虽避免了维护有序性的开销，却无法支持范围查询或后继查询。在GPU上，频繁更新场景下维护和遍历索引层会引发线程竞争与线程束发散。针对这些问题，我们提出翻转索引范式的FliX——一种基于比较的翻转索引策略，适用于动态且完全驻留GPU的并发数据结构。传统GPU并发数据结构通常批量处理操作并将每个操作分配给单个GPU线程或线程束，而FliX将计算资源（如线程束）按桶分配到数据层，每个桶在操作批次中定位其负责的操作。FliX可将多次索引层遍历简化为单次对操作批次的二分查找，从而减少冗余工作与线程束发散。此外，由于无需维护索引层，FliX还简化了更新操作。实验表明，与主流GPU B树相比，FliX将查询延迟降低6.5倍；与主流GPU LSM树相比降低1.5倍，同时在每存储容量下的吞吐量比有序竞争方案高4倍。尽管维护有序性，FliX在查询和删除性能上也超越现有最先进的无序GPU哈希表，并在插入性能方面极具竞争力。在更新密集型负载中，其插入吞吐量比最接近的全动态有序基线方案提升8倍以上，且支持动态内存回收。这些结果表明，消除索引层并采用"计算映射至桶"的策略，可在不牺牲查询性能的前提下实现实用的全动态GPU索引。