Figaro on GPUs：双表上的实现 (Figaro on GPUs: Two Tables)

This paper introduces the implementation of the Figaro-GPU algorithm for computing a QR and SVD decomposition over a join matrix defined by the natural join over two tables on GPUs. Figaro-GPU's main novelty is a GPU implementation of the Figaro algorithm \cite{olteanu2022givens, vzivanovic2022linear,olteanu2024givens}: symbolical transformations combined with the GPU parallelized computations. This leads to the theoretical performance improvements proportional to the ratio of the join and input sizes. In experiments with the synthetic tables, for computing the upper triangular matrix and the right singular vectors matrix, Figaro-GPU outperforms in runtime NVIDIA cuSolver library for the upper triangular matrix by a factor proportional to the gap between the join and input sizes, which varies from 5x-150x for NVIDIA 2070 and up to 160x for NVIDIA 4080 while using up to 1000x less memory than the GPU cuSolver. For computing singular values, Figaro-GPU outperforms in runtime NVIDIA cuSolver library from 2.8x-31x for NVIDIA 4080.

翻译：本文介绍了Figaro-GPU算法的实现，该算法用于在GPU上计算由两个表的自然连接所定义的连接矩阵的QR分解和奇异值分解。Figaro-GPU的主要创新点在于实现了Figaro算法 \cite{olteanu2022givens, vzivanovic2022linear,olteanu2024givens} 的GPU版本：将符号变换与GPU并行计算相结合。这带来了理论性能的提升，其提升幅度与连接大小和输入大小的比值成正比。在使用合成表进行的实验中，对于计算上三角矩阵和右奇异向量矩阵，Figaro-GPU在运行时间上超越了NVIDIA cuSolver库（计算上三角矩阵部分），其优势倍数与连接大小和输入大小之间的差距成正比。在NVIDIA 2070上，该优势倍数从5倍到150倍不等；在NVIDIA 4080上，优势倍数最高可达160倍，同时使用的内存比GPU cuSolver少高达1000倍。在计算奇异值方面，Figaro-GPU在NVIDIA 4080上的运行时间比NVIDIA cuSolver库快2.8倍到31倍。

相关内容

英伟达（NVIDIA）

关注 25

NVIDIA（全称NVIDIA Corporation，NASDAQ：NVDA，发音：IPA：/ɛnvɪdɪə/，台湾官方中文名为輝達），创立于1993年4月，是一家以设计显示芯片和芯片组为主的半导体公司。NVIDIA亦会设计游戏机核心，例如Xbox和PlayStation 3。NVIDIA最出名的产品线是为个人与游戏玩家所设计的GeForce系列，为专业工作站而设计的Quadro系列，以及为服务器和高效运算而设计的Tesla系列。 NVIDIA的总部设在美国加利福尼亚州的圣克拉拉。是一家无晶圆（Fabless）IC半导体设计公司。"NVIDIA"的读音与英文"video"相似，亦与西班牙文evidia（英文"envy"）相似。现任总裁为黄仁勋。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日