Graph neural networks (GNNs) have gained significant interest for applications such as citation network analysis and drug discovery due to their ability to apply machine learning techniques on graph-structured data. GNNs typically employ a two-stage execution pipeline consisting of combination and aggregation kernels. The combination stage performs data-intensive convolution operations with relatively regular memory access patterns, whereas the aggregation stage operates on sparse graph data with highly irregular accesses. These heterogeneous memory behaviors make conventional CPU- and GPU-based execution energy inefficient due to substantial data movement overheads. Existing accelerators attempt to mitigate these challenges using specialized architectures and processing-in-memory (PIM) techniques. However, prior approaches often suffer from scalability limitations, area overheads, restricted parallelism, and energy inefficiencies associated with analog compute and dedicated accelerator structures. This paper presents NEM-GNN, a scalable DAC/ADC-less processing-in-memory architecture for graph neural network acceleration. The proposed design introduces early compute termination mechanisms, pre-computation using reconfigurable system-on-chip components, and graph- and sparsity-aware near-memory aggregation using a compute-as-soon-as-ready (CAR) and broadcast-based execution model. Experimental results demonstrate that NEM-GNN achieves approximately 80--230x higher performance, 80--300x higher throughput, 850--1134x better energy efficiency, and 7--8x higher compute density compared to prior state-of-the-art approaches.
翻译:图神经网络(GNN)因其能够对图结构数据应用机器学习技术,在引文网络分析和药物发现等应用中引起了广泛关注。GNN通常采用由组合核和聚合核组成的两阶段执行流水线。组合阶段执行数据密集型的卷积操作,其内存访问模式相对规整;而聚合阶段则在高度不规则访问的稀疏图数据上进行操作。这些异构的内存行为导致基于传统CPU和GPU的执行因大量数据搬运开销而能效低下。现有加速器尝试通过专用架构和存内处理(PIM)技术来缓解这些挑战。然而,先前的方法常受限于可扩展性、面积开销、受限的并行度,以及与模拟计算和专用加速器结构相关的能效低下问题。本文提出了NEM-GNN,一种用于图神经网络加速的可扩展且无需DAC/ADC的存内处理架构。所提出的设计引入了早期计算终止机制、使用可重构片上系统组件的预计算,以及采用“就绪即计算”(CAR)和基于广播的执行模型的图与稀疏感知近存聚合。实验结果表明,与先前最先进的方法相比,NEM-GNN在性能、吞吐量、能效和计算密度上分别实现了约80--230倍、80--300倍、850--1134倍和7--8倍的提升。