This paper introduces the implementation of the Figaro-GPU algorithm for computing a QR and SVD decomposition over a join matrix defined by the natural join over two tables on GPUs. Figaro-GPU's main novelty is a GPU implementation of the Figaro algorithm \cite{olteanu2022givens, vzivanovic2022linear,olteanu2024givens}: symbolical transformations combined with the GPU parallelized computations. This leads to the theoretical performance improvements proportional to the ratio of the join and input sizes. In experiments with the synthetic tables, for computing the upper triangular matrix and the right singular vectors matrix, Figaro-GPU outperforms in runtime NVIDIA cuSolver library for the upper triangular matrix by a factor proportional to the gap between the join and input sizes, which varies from 5x-150x for NVIDIA 2070 and up to 160x for NVIDIA 4080 while using up to 1000x less memory than the GPU cuSolver. For computing singular values, Figaro-GPU outperforms in runtime NVIDIA cuSolver library from 2.8x-31x for NVIDIA 4080.
翻译:本文介绍了Figaro-GPU算法的实现,该算法用于在GPU上计算由两个表的自然连接所定义的连接矩阵的QR分解和奇异值分解。Figaro-GPU的主要创新点在于实现了Figaro算法 \cite{olteanu2022givens, vzivanovic2022linear,olteanu2024givens} 的GPU版本:将符号变换与GPU并行计算相结合。这带来了理论性能的提升,其提升幅度与连接大小和输入大小的比值成正比。在使用合成表进行的实验中,对于计算上三角矩阵和右奇异向量矩阵,Figaro-GPU在运行时间上超越了NVIDIA cuSolver库(计算上三角矩阵部分),其优势倍数与连接大小和输入大小之间的差距成正比。在NVIDIA 2070上,该优势倍数从5倍到150倍不等;在NVIDIA 4080上,优势倍数最高可达160倍,同时使用的内存比GPU cuSolver少高达1000倍。在计算奇异值方面,Figaro-GPU在NVIDIA 4080上的运行时间比NVIDIA cuSolver库快2.8倍到31倍。