NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator

Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-scale graph datasets, particularly when leveraging message passing. To tackle these challenges, we introduce NeuraChip, a novel GNN spatial accelerator based on Gustavson's algorithm. NeuraChip decouples the multiplication and addition computations in sparse matrix multiplication. This separation allows for independent exploitation of their unique data dependencies, facilitating efficient resource allocation. We introduce a rolling eviction strategy to mitigate data idling in on-chip memory as well as address the prevalent issue of memory bloat in sparse graph computations. Furthermore, the compute resource load balancing is achieved through a dynamic reseeding hash-based mapping, ensuring uniform utilization of computing resources agnostic of sparsity patterns. Finally, we present NeuraSim, an open-source, cycle-accurate, multi-threaded, modular simulator for comprehensive performance analysis. Overall, NeuraChip presents a significant improvement, yielding an average speedup of 22.1x over Intel's MKL, 17.1x over NVIDIA's cuSPARSE, 16.7x over AMD's hipSPARSE, and 1.5x over prior state-of-the-art SpGEMM accelerator and 1.3x over GNN accelerator. The source code for our open-sourced simulator and performance visualizer is publicly accessible on GitHub https://neurachip.us

翻译：图神经网络（GNN）正成为处理跨领域非欧几里得数据的强大工具，涵盖从社交网络分析到生物信息学等多个领域。尽管其有效性显著，但由于大规模图数据集（尤其在利用消息传递机制时）带来的可扩展性挑战，其应用尚未普及。为应对这些挑战，我们提出NeuraChip——一种基于Gustavson算法的新型GNN空间加速器。NeuraChip将稀疏矩阵乘法中的乘法与加法运算解耦，使两者能独立利用各自独特的数据依赖关系，从而促进资源高效分配。我们引入滚动驱逐策略以减少片上存储器中的数据闲置，并解决稀疏图计算中普遍存在的内存膨胀问题。此外，通过基于动态重播种哈希的映射方法实现计算资源负载均衡，确保计算资源在独立于稀疏模式的情况下均匀利用。最后，我们提出NeuraSim——一款开源、周期精确、多线程、模块化仿真器，用于综合性能分析。总体而言，NeuraChip实现显著性能提升，较Intel MKL平均加速22.1倍，较NVIDIA cuSPARSE加速17.1倍，较AMD hipSPARSE加速16.7倍，较先前最先进的SpGEMM加速器加速1.5倍，较GNN加速器加速1.3倍。我们开源的仿真器与性能可视化工具源代码已公开于GitHub https://neurachip.us。