The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG, a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41X, 4.81X, and 10.83X faster than DGL, MGG-UVM, and ROC, respectively.
翻译:图神经网络(GNN)输入图规模的不断增长凸显了对多GPU平台的需求。然而,现有基于扩展密集深度神经网络(DNN)的常规做法,使得多GPU GNN系统各自独立地优化计算与通信操作。针对不规则稀疏且细粒度的GNN工作负载,此类方案未能充分利用计算与通信操作的联合调度/优化以实现高性能输出。为此,我们提出MGG——一种全新的系统设计方案,用于在多GPU平台上加速全图GNN。MGG的核心是其创新的动态软件流水线,可在GPU内核内实现细粒度的计算-通信重叠。具体而言,MGG引入面向GNN定制的流水线构建机制与GPU感知的流水线映射策略,以促进负载均衡与操作重叠。此外,MGG还集成了一种包含分析建模与优化启发式策略的智能运行时设计,能够动态提升执行性能。广泛评估表明,MGG在各种配置下均优于最先进的全图GNN系统:相比DGL、MGG-UVM和ROC,平均加速比分别达4.41倍、4.81倍和10.83倍。