Sorting and binary searching a dense array can be considered the simplest and most space efficient form of indexing. This holds especially on GPUs as they exhibit exceptional sorting performance. However, the popular opinion is that such a primitive approach cannot compete with large, highly-sophisticated GPU index structures in terms of lookup performance, and hence, should not actually be considered in practice. In this work, we will investigate whether binary search actually still deserves this bad reputation or whether it can be a fast and space-minimal alternative to more heavy-weight index structures, in particular when utilizing all the advancements of current highly-parallel GPU architectures. To find out, we introduce advanced variants of binary search to GPUs and equip them with a set of established low-level optimizations. These architecture-specific optimizations aim at getting the most out of binary search by (a) greatly reducing the overall amount of GPU memory accesses required during search, (b) exploiting the enormous benefits of memory access coalescing on a GPU, and (c) maximizing scalability by reordering the dataset into a more favorable layout. By comparing our optimized search strategies against nine state-of-the-art GPU index structures under several static indexing workloads, we demonstrate that they not only outperform all competitors (except for hashing-based approaches) by a factor of up to 3.8, but also maintain the smallest possible memory footprint.
翻译:对密集数组进行排序和二分查找可视为最简单且空间效率最高的索引形式。这在GPU上尤其适用,因为它们展现出卓越的排序性能。然而,普遍观点认为这种原始方法在查找性能上无法与庞大且高度复杂的GPU索引结构竞争,因此在实践中不应被考虑。在本研究中,我们将探讨二分查找是否确实仍应背负这种不良声誉,抑或它能成为更重量级索引结构的快速且空间最小化的替代方案,特别是在充分利用当前高度并行GPU架构的所有技术进步时。为探究此问题,我们将二分查找的高级变体引入GPU,并为其配备一系列成熟的底层优化技术。这些架构特定的优化旨在通过以下方式充分发挥二分查找的潜力:(a) 大幅减少搜索过程中所需的GPU内存访问总量,(b) 利用GPU上内存访问合并的巨大优势,以及(c) 通过将数据集重新排列为更有利的布局来最大化可扩展性。通过在多个静态索引工作负载下将我们优化的搜索策略与九种最先进的GPU索引结构进行比较,我们证明这些策略不仅以最高达3.8倍的性能优势超越所有竞争者(基于哈希的方法除外),同时保持了最小的内存占用。