Graph Neural Networks (GNNs) are powerful tools for learning graph-structured data, but their scalability is hindered by inefficient mini-batch generation, data transfer bottlenecks, and costly inter-GPU synchronization. Existing training frameworks fail to overlap these stages, leading to suboptimal resource utilization. This paper proposes MQ-GNN, a multi-queue pipelined framework that maximizes training efficiency by interleaving GNN training stages and optimizing resource utilization. MQ-GNN introduces Ready-to-Update Asynchronous Consistent Model (RaCoM), which enables asynchronous gradient sharing and model updates while ensuring global consistency through adaptive periodic synchronization. Additionally, it employs global neighbor sampling with caching to reduce data transfer overhead and an adaptive queue-sizing strategy to balance computation and memory efficiency. Experiments on four large-scale datasets and ten baseline models demonstrate that MQ-GNN achieves up to \boldmath $\bm{4.6\,\times}$ faster training time and 30% improved GPU utilization while maintaining competitive accuracy. These results establish MQ-GNN as a scalable and efficient solution for multi-GPU GNN training.
翻译:图神经网络(GNNs)是学习图结构数据的强大工具,但其可扩展性受到低效的小批量生成、数据传输瓶颈以及昂贵的GPU间同步成本的制约。现有训练框架未能实现这些阶段的充分重叠,导致资源利用率欠佳。本文提出MQ-GNN,一种多队列流水线框架,通过交错执行GNN训练阶段并优化资源利用,最大化训练效率。MQ-GNN引入了就绪更新异步一致性模型(RaCoM),该模型支持异步梯度共享与模型更新,同时通过自适应周期性同步确保全局一致性。此外,该框架采用带缓存的全局邻居采样以减少数据传输开销,并利用自适应队列大小调整策略来平衡计算与内存效率。在四个大规模数据集和十个基线模型上的实验表明,MQ-GNN在保持竞争力精度的同时,实现了高达$\bm{4.6\,\times}$的训练加速和30%的GPU利用率提升。这些结果确立了MQ-GNN作为一种可扩展且高效的多GPU GNN训练解决方案。