Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim is publicly available at https://github.com/CMU-SAFARI/PyGim.
翻译:图神经网络(GNN)是用于分析图结构数据的新兴机器学习模型。图神经网络的执行过程包含计算密集型与内存密集型两类核心操作,其中后者占主导地位,且因内存与处理器间的数据移动而成为显著性能瓶颈。存内处理(PIM)系统通过将简易处理器置于存储阵列近端或内部,能够缓解此类数据移动瓶颈。本文提出PyGim——一个可在真实PIM系统上高效加速图神经网络运行的机器学习库。我们针对真实PIM系统设计了面向图神经网络内存密集型核心操作的智能并行化技术,并为其开发了便捷的Python应用程序接口。本库支持混合式图神经网络执行模式,其中计算密集型与内存密集型核心操作可分别于处理器中心型与内存中心型计算系统中执行。我们在配备1992个PIM核心的真实PIM系统上,使用新兴图神经网络模型对PyGim进行了广泛评估,结果表明其平均性能优于基于Intel Xeon的先进CPU对照方案达3.04倍,且资源利用率高于CPU与GPU系统。本研究为软件、系统及硬件设计者提供了实用建议。PyGim已在https://github.com/CMU-SAFARI/PyGim公开提供。