Graph neural networks (GNNs) have seen extensive application in domains such as social networks, bioinformatics, and recommendation systems. However, the irregularity and sparsity of graph data challenge traditional computing methods, which are insufficient to meet the performance demands of GNNs. Recent research has explored parallel acceleration using CUDA Cores and Tensor Cores, but significant challenges persist: (1) kernel fusion leads to false high utilization, failing to treat CUDA and Tensor Cores as independent resources, and (2) heterogeneous cores have distinct computation preferences, causing inefficiencies. To address these issues, this paper proposes FTC-GNN, a novel acceleration framework that efficiently utilizes CUDA and Tensor Cores for GNN computation. FTC-GNN introduces (1) a collaborative design that enables the parallel utilization of CUDA and Tensor Cores and (2) a sparse-to-dense transformation strategy that assigns dense matrix operations to Tensor Cores while leveraging CUDA Cores for data management and sparse edge processing. This design optimizes GPU resource utilization and improves computational efficiency. Experimental results demonstrate the effectiveness of FTC-GNN using GCN and AGNN models across various datasets. For GCN, FTC-GNN achieves speedups of 4.90x, 7.10x, and 1.17x compared to DGL, PyG, and TC-GNN, respectively. For AGNN, it achieves speedups of 5.32x, 2.92x, and 1.02x, establishing its superiority in accelerating GNN computations.
翻译:图神经网络(GNNs)在社交网络、生物信息学和推荐系统等领域得到了广泛应用。然而,图数据的不规则性和稀疏性对传统计算方法提出了挑战,这些方法难以满足GNNs的性能需求。近期研究探索了利用CUDA核心和张量核心进行并行加速,但仍存在显著挑战:(1)内核融合导致虚假的高利用率,未能将CUDA核心与张量核心视为独立资源;(2)异构核心具有不同的计算偏好,导致效率低下。为解决这些问题,本文提出FTC-GNN,一种新颖的加速框架,能高效利用CUDA核心和张量核心进行GNN计算。FTC-GNN引入了(1)一种协同设计,实现CUDA核心与张量核心的并行利用;(2)一种稀疏到稠密转换策略,将稠密矩阵运算分配给张量核心,同时利用CUDA核心进行数据管理和稀疏边处理。该设计优化了GPU资源利用率并提升了计算效率。实验结果表明,FTC-GNN在使用GCN和AGNN模型的不同数据集上均表现出有效性。对于GCN,与DGL、PyG和TC-GNN相比,FTC-GNN分别实现了4.90倍、7.10倍和1.17倍的加速;对于AGNN,则分别实现了5.32倍、2.92倍和1.02倍的加速,确立了其在加速GNN计算方面的优越性。