Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML framework that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively, to match their algorithmic nature. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim will be open-sourced to enable the widespread use of PIM systems in GNNs.
翻译:图神经网络(GNN)是一种新兴的机器学习模型,用于分析图结构数据。图神经网络(GNN)的执行涉及计算密集型与内存密集型内核,其中后者主导总执行时间,其瓶颈主要源于内存与处理器之间的数据移动。处理中存储(PIM)系统通过在内存阵列内部或附近放置简易处理器,可缓解此类数据移动瓶颈。本文提出PyGim——一种在真实PIM系统上加速GNN的高效机器学习框架。我们针对真实PIM系统设计了面向GNN内存密集型内核的智能并行化技术,并开发了便捷的Python应用程序接口。我们提出混合型GNN执行策略,将计算密集型内核与内存密集型内核分别部署在以处理器为中心和以内存为中心的计算系统中,以匹配其算法特性。我们利用新兴GNN模型,在包含1992个PIM核心的真实PIM系统上对PyGim进行全面评估,结果表明其相比Intel Xeon上的当前最优CPU方案平均性能提升3.04倍,且资源利用率高于CPU与GPU系统。本研究为软件、系统与硬件设计者提供了实用性建议。PyGim将开源,以推动PIM系统在GNN领域的广泛应用。