The era of GPU-powered data analytics has arrived. In this paper, we argue that recent advances in hardware (e.g., larger GPU memory, faster interconnect and IO, and declining cost) and software (e.g., composable data systems and mature libraries) have removed the key barriers that have limited the wider adoption of GPU data analytics. We present Sirius, a prototype open-source GPU-native SQL engine that offers drop-in acceleration for diverse data systems. Sirius treats GPU as the primary engine and leverages libraries like libcudf for high-performance relational operators. It provides drop-in acceleration for existing databases by leveraging the standard Substrait query representation, replacing the CPU engine without changing the user-facing interface. Sirius achieves 8.3x and 7.4x better cost efficiency on TPC-H and ClickBench, respectively, when integrated with single-node DuckDB, and delivers up to 12.5x speedup when integrated with Apache Doris distributed engine.
翻译:GPU驱动的数据分析时代已经到来。本文认为,硬件(如更大的GPU内存、更快的互连与I/O、成本下降)与软件(如可组合数据系统和成熟库)的最新进展已消除制约GPU数据分析广泛采用的关键障碍。我们提出Sirius——一个原型开源GPU原生SQL引擎,可为多样化数据系统提供即插即用式加速。Sirius将GPU视作核心处理引擎,并利用libcudf等库实现高性能关系型算子。该系统通过采用标准Substrait查询表示形式,在不改变用户接口的前提下替换CPU引擎,从而为现有数据库提供即插即用加速。实验表明:与单节点DuckDB集成时,Sirius在TPC-H和ClickBench基准测试中分别实现8.3倍与7.4倍的性价比提升;与Apache Doris分布式引擎集成时,最高可获得12.5倍的性能加速。