A Unified CPU-GPU Protocol for GNN Training

Training a Graph Neural Network (GNN) model on large-scale graphs involves a high volume of data communication and compu- tations. While state-of-the-art CPUs and GPUs feature high computing power, the Standard GNN training protocol adopted in existing GNN frameworks cannot efficiently utilize the platform resources. To this end, we propose a novel Unified CPU-GPU protocol that can improve the resource utilization of GNN training on a CPU-GPU platform. The Unified CPU-GPU protocol instantiates multiple GNN training processes in parallel on both the CPU and the GPU. By allocating training processes on the CPU to perform GNN training collaboratively with the GPU, the proposed protocol improves the platform resource utilization and reduces the CPU-GPU data transfer overhead. Since the performance of a CPU and a GPU varies, we develop a novel load balancer that balances the workload dynamically between CPUs and GPUs during runtime. We evaluate our protocol using two representative GNN sampling algorithms, with two widely-used GNN models, on three datasets. Compared with the standard training protocol adopted in the state-of-the-art GNN frameworks, our protocol effectively improves resource utilization and overall training time. On a platform where the GPU moderately outperforms the CPU, our protocol speeds up GNN training by up to 1.41x. On a platform where the GPU significantly outperforms the CPU, our protocol speeds up GNN training by up to 1.26x. Our protocol is open-sourced and can be seamlessly integrated into state-of-the-art GNN frameworks and accelerate GNN training. Our protocol particularly benefits those with limited GPU access due to its high demand.

翻译：在大规模图上训练图神经网络（Graph Neural Network, GNN）模型涉及大量的数据通信与计算。尽管当前最先进的CPU和GPU具备强大的计算能力，但现有GNN框架所采用的标准GNN训练协议无法高效利用平台资源。为此，我们提出一种新颖的统一CPU-GPU协议，该协议能够提升CPU-GPU平台上GNN训练的资源利用率。统一CPU-GPU协议在CPU和GPU上并行实例化多个GNN训练进程。通过将训练进程分配至CPU，使其与GPU协同完成GNN训练，该协议提高了平台资源利用率，并降低了CPU-GPU数据传输开销。鉴于CPU与GPU性能存在差异，我们开发了一种新颖的负载均衡器，可在运行时动态平衡CPU与GPU之间的工作负载。我们采用两种代表性GNN采样算法、两种广泛使用的GNN模型，在三个数据集上评估了该协议。与当前最先进GNN框架中采用的标准训练协议相比，我们的协议有效提升了资源利用率并缩短了整体训练时间。在GPU性能适度优于CPU的平台中，该协议将GNN训练速度提升至1.41倍。而在GPU性能显著优于CPU的平台中，该协议将GNN训练速度提升至1.26倍。我们的协议已开源，可无缝集成至现有最先进GNN框架中，加速GNN训练。该协议尤其有助于因高需求而受限的GPU访问场景。