Graph Neural Networks (GNNs) are emerging ML models to analyze graph-structure data. Graph Neural Network (GNN) execution involves both compute-intensive and memory-intensive kernels, the latter dominates the total time, being significantly bottlenecked by data movement between memory and processors. Processing-In-Memory (PIM) systems can alleviate this data movement bottleneck by placing simple processors near or inside to memory arrays. In this work, we introduce PyGim, an efficient ML library that accelerates GNNs on real PIM systems. We propose intelligent parallelization techniques for memory-intensive kernels of GNNs tailored for real PIM systems, and develop handy Python API for them. We provide hybrid GNN execution, in which the compute-intensive and memory-intensive kernels are executed in processor-centric and memory-centric computing systems, respectively. We extensively evaluate PyGim on a real-world PIM system with 1992 PIM cores using emerging GNN models, and demonstrate that it outperforms its state-of-the-art CPU counterpart on Intel Xeon by on average 3.04x, and achieves higher resource utilization than CPU and GPU systems. Our work provides useful recommendations for software, system and hardware designers. PyGim is publicly available at https://github.com/CMU-SAFARI/PyGim.
翻译:图神经网络(GNNs)是用于分析图结构数据的新兴机器学习模型。图神经网络(GNN)的执行涉及计算密集型与内存密集型核心,后者占总时间的主导地位,其性能主要受限于内存与处理器之间的数据移动。存内处理(PIM)系统通过在存储阵列附近或内部放置简单处理器,能够缓解这种数据移动瓶颈。本文中,我们介绍了PyGim,这是一个高效的机器学习库,可在真实PIM系统上加速GNN。我们针对真实PIM系统,为GNN的内存密集型核心提出了智能并行化技术,并为其开发了便捷的Python API。我们提供了混合GNN执行方案,其中计算密集型与内存密集型核心分别在以处理器为中心和以内存为中心的计算系统中执行。我们在一个拥有1992个PIM核心的真实PIM系统上,使用新兴GNN模型对PyGim进行了广泛评估,结果表明其性能平均优于基于Intel Xeon的最先进CPU方案3.04倍,并且比CPU和GPU系统实现了更高的资源利用率。我们的工作为软件、系统及硬件设计者提供了有益参考。PyGim已在https://github.com/CMU-SAFARI/PyGim公开提供。