With the growing interest in Machine Learning (ML), Graphic Processing Units (GPUs) have become key elements of any computing infrastructure. Their widespread deployment in data centers and the cloud raises the question of how to use them beyond ML use cases, with growing interest in employing them in a database context. In this paper, we explore and analyze the implementation of relational joins on GPUs from an end-to-end perspective, meaning that we take result materialization into account. We conduct a comprehensive performance study of state-of-the-art GPU-based join algorithms over diverse synthetic workloads and TPC-H/TPC-DS benchmarks. Without being restricted to the conventional setting where each input relation has only one key and one non-key with all attributes being 4-bytes long, we investigate the effect of various factors (e.g., input sizes, number of non-key columns, skewness, data types, match ratios, and number of joins) on the end-to-end throughput. Furthermore, we propose a technique called "Gather-from-Transformed-Relations" (GFTR) to reduce the long-ignored yet high materialization cost in GPU-based joins. The experimental evaluation shows significant performance improvements from GFTR, with throughput gains of up to 2.3 times over previous work. The insights gained from the performance study not only advance the understanding of GPU-based joins but also introduce a structured approach to selecting the most efficient GPU join algorithm based on the input relation characteristics.
翻译:随着机器学习(ML)的兴趣日益增长,图形处理单元(GPU)已成为任何计算基础设施的关键组成部分。它们在数据中心和云中的广泛部署引发了一个问题:如何在ML用例之外使用它们,人们对在数据库环境中使用GPU的兴趣日益浓厚。在本文中,我们从端到端的角度探索并分析了GPU上关系连接的实现,这意味着我们考虑了结果物化。我们针对多样化的合成工作负载和TPC-H/TPC-DS基准测试,对基于GPU的最先进连接算法进行了全面的性能研究。不受每个输入关系只有一个键和一个非键且所有属性为4字节长的传统设置限制,我们研究了各种因素(例如,输入大小、非键列数、倾斜度、数据类型、匹配比率和连接数)对端到端吞吐量的影响。此外,我们提出了一种称为“来自转换关系的收集”(GFTR)的技术,以减少基于GPU连接中长期以来被忽视但高代价的物化成本。实验评估显示,GFTR带来了显著的性能提升,与先前工作相比,吞吐量提高了高达2.3倍。从性能研究中获得的见解不仅增进了对基于GPU连接的理解,还引入了一种基于输入关系特征选择最有效GPU连接算法的结构化方法。