Graph neural networks (GNNs) have emerged as a powerful tool to process graph-based data in fields like communication networks, molecular interactions, chemistry, social networks, and neuroscience. GNNs are characterized by the ultra-sparse nature of their adjacency matrix that necessitates the development of dedicated hardware beyond general-purpose sparse matrix multipliers. While there has been extensive research on designing dedicated hardware accelerators for GNNs, few have extensively explored the impact of the sparse storage format on the efficiency of the GNN accelerators. This paper proposes SCV-GNN with the novel sparse compressed vectors (SCV) format optimized for the aggregation operation. We use Z-Morton ordering to derive a data-locality-based computation ordering and partitioning scheme. The paper also presents how the proposed SCV-GNN is scalable on a vector processing system. Experimental results over various datasets show that the proposed method achieves a geometric mean speedup of $7.96\times$ and $7.04\times$ over CSC and CSR aggregation operations, respectively. The proposed method also reduces the memory traffic by a factor of $3.29\times$ and $4.37\times$ over compressed sparse column (CSC) and compressed sparse row (CSR), respectively. Thus, the proposed novel aggregation format reduces the latency and memory access for GNN inference.
翻译:图神经网络(GNN)已成为处理通信网络、分子相互作用、化学、社交网络和神经科学等领域中基于图的数据的有力工具。GNN的显著特征是其邻接矩阵具有超稀疏性,这要求开发超越通用稀疏矩阵乘法器的专用硬件。尽管已有大量关于设计GNN专用硬件加速器的研究,但很少有工作深入探讨稀疏存储格式对GNN加速器效率的影响。本文提出SCV-GNN,其采用针对聚合操作优化的新型稀疏压缩向量(SCV)格式。我们利用Z-Morton排序推导出一种基于数据局部性的计算排序与划分方案。此外,本文还展示了所提出的SCV-GNN如何在向量处理系统上实现可扩展性。在多个数据集上的实验结果表明,所提方法相较于CSC和CSR聚合操作分别实现了$7.96\times$和$7.04\times$的几何平均加速比。同时,相较于压缩稀疏列(CSC)和压缩稀疏行(CSR),该方法分别将内存流量降低了$3.29\times$和$4.37\times$倍。因此,这种新型聚合格式有效减少了GNN推理的延迟和内存访问次数。