NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator

Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-scale graph datasets, particularly when leveraging message passing. To tackle these challenges, we introduce NeuraChip, a novel GNN spatial accelerator based on Gustavson's algorithm. NeuraChip decouples the multiplication and addition computations in sparse matrix multiplication. This separation allows for independent exploitation of their unique data dependencies, facilitating efficient resource allocation. We introduce a rolling eviction strategy to mitigate data idling in on-chip memory as well as address the prevalent issue of memory bloat in sparse graph computations. Furthermore, the compute resource load balancing is achieved through a dynamic reseeding hash-based mapping, ensuring uniform utilization of computing resources agnostic of sparsity patterns. Finally, we present NeuraSim, an open-source, cycle-accurate, multi-threaded, modular simulator for comprehensive performance analysis. Overall, NeuraChip presents a significant improvement, yielding an average speedup of 22.1x over Intel's MKL, 17.1x over NVIDIA's cuSPARSE, 16.7x over AMD's hipSPARSE, and 1.5x over prior state-of-the-art SpGEMM accelerator and 1.3x over GNN accelerator. The source code for our open-sourced simulator and performance visualizer is publicly accessible on GitHub https://neurachip.us

翻译：图神经网络（GNNs）正成为跨域处理非欧几里得数据的强大工具，涵盖从社交网络分析到生物信息学等多个领域。尽管其有效性显著，但由于大规模图数据集带来的可扩展性挑战——尤其是在利用消息传递机制时——其应用尚未普及。为应对这些挑战，我们提出NeuraChip，一种基于Gustavson算法的新型GNN空间加速器。NeuraChip将稀疏矩阵乘法中的乘法与加法计算解耦，这种分离机制允许独立利用它们独特的数据依赖关系，从而促进资源的高效分配。我们引入滚动驱逐策略以缓解片上存储器中的数据闲置问题，并解决稀疏图计算中普遍存在的内存膨胀现象。此外，通过动态重播种的哈希映射实现计算资源负载均衡，确保计算资源在忽略稀疏模式差异的情况下均匀利用。最后，我们提出NeuraSim——一种开源、周期精确、多线程、模块化的模拟器，用于全面性能分析。总体而言，NeuraChip实现了显著改进，相较于Intel MKL、NVIDIA cuSPARSE、AMD hipSPARSE分别获得平均22.1倍、17.1倍、16.7倍的加速，并超越先前最先进的SpGEMM加速器1.5倍及GNN加速器1.3倍。我们开源模拟器与性能可视化工具的源代码已公开于GitHub https://neurachip.us。