Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML framework that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively, to match their algorithmic nature. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim will be open-sourced to enable the widespread use of PIM systems in GNNs.
翻译:图神经网络(Graph Neural Networks, GNNs)是分析图结构数据的新兴机器学习模型。GNN执行包含计算密集型和内存密集型两类核心操作,其中内存密集型操作占总执行时间的主导地位,且受制于内存与处理器间的数据搬移瓶颈。处理中内存(Processing-In-Memory, PIM)系统通过在内存阵列内部或邻近位置部署简易处理器来缓解这一数据搬移瓶颈。本文提出PyGim,一个高效的机器学习框架,用于在实际PIM系统中加速GNN。我们针对实际PIM系统量身定制了GNN内存密集型核心操作的智能并行化技术,并为其开发了便捷的Python API。我们提供混合式GNN执行方案,其中计算密集型与内存密集型核心操作分别由以处理器为中心和以内存为中心的计算系统执行,以匹配其算法特性。我们在配备1992个PIM核心的实际PIM系统上,使用新兴GNN模型对PyGim进行全面评估,结果表明:与Intel Xeon上最先进的CPU方案相比,PyGim平均性能提升3.04倍,且在资源利用率上优于CPU和GPU系统。本工作为软件、系统及硬件设计者提供了实用建议。PyGim将开源以促进PIM系统在GNN领域的广泛应用。