Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim is publicly available at https://github.com/CMU-SAFARI/PyGim.
翻译:图神经网络(GNNs)是用于分析图结构数据的新兴机器学习模型。图神经网络(GNN)的执行同时涉及计算密集型与内存密集型核心操作,其中后者占主导地位,其总时间主要受限于内存与处理器之间的数据移动。存内处理(PIM)系统通过在存储器阵列附近或内部放置简易处理器,能够缓解此类数据移动瓶颈。本工作中,我们介绍了PyGim——一个在真实PIM系统上加速GNN的高效机器学习库。我们针对真实PIM系统,为GNN的内存密集型核心操作提出了智能并行化技术,并为其开发了便捷的Python API。我们提供了混合式GNN执行方案,其中计算密集型与内存密集型核心操作分别在以处理器为中心和以存储器为中心的计算系统中执行。我们在一个具有1992个PIM核心的真实PIM系统上,使用新兴GNN模型对PyGim进行了广泛评估,结果表明其性能平均超越基于Intel Xeon的先进CPU对照方案3.04倍,并且实现了比CPU与GPU系统更高的资源利用率。我们的工作为软件、系统及硬件设计者提供了有益参考。PyGim已在https://github.com/CMU-SAFARI/PyGim 公开提供。