Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse and irregular graph-based operations. To this end, we propose TC-GNN, the first GNN acceleration framework based on GPU Tensor Core Units (TCUs). The core idea is to reconcile the "Sparse" GNN computation with the high-performance "Dense" TCUs. Specifically, we conduct an in-depth analysis of the sparse operations in mainstream GNN computing frameworks. We introduce a novel sparse graph translation technique to facilitate TCU processing of the sparse GNN workload. We implement an effective CUDA core and TCU collaboration design to fully utilize GPU resources. We integrate TC-GNN with the PyTorch framework for high programmability. Rigorous experiments show an average of 1.70X speedup over the state-of-the-art DGL framework across various models and datasets.
翻译:近期,图神经网络(GNNs)作为基于图的机器学习的骨干,在电子商务等各个领域展现出巨大的成功。然而,由于图操作的高度稀疏性和不规则性,GNN的性能通常不尽如人意。为此,我们提出了TC-GNN,这是首个基于GPU张量核心单元(TCUs)的GNN加速框架。其核心思想是协调“稀疏”的GNN计算与高性能的“稠密”TCU。具体而言,我们对主流GNN计算框架中的稀疏操作进行了深入分析。我们引入了一种新颖的稀疏图转换技术,以促进TCU处理稀疏GNN工作负载。我们实现了一种有效的CUDA核心与TCU协作设计,以充分利用GPU资源。我们将TC-GNN与PyTorch框架集成,以实现高可编程性。严格的实验表明,在各种模型和数据集上,与最先进的DGL框架相比,平均实现了1.70倍的加速。