Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains, yet their acceleration on mainstream GPUs is challenged by workload imbalance and memory access irregularity. To address these challenges, we present Accel-GCN, a GPU accelerator architecture for GCNs. The design of Accel-GCN encompasses: (i) a lightweight degree sorting stage to group nodes with similar degree; (ii) a block-level partition strategy that dynamically adjusts warp workload sizes, enhancing shared memory locality and workload balance, and reducing metadata overhead compared to designs like GNNAdvisor; (iii) a combined warp strategy that improves memory coalescing and computational parallelism in the column dimension of dense matrices. Utilizing these principles, we formulated a kernel for sparse matrix multiplication (SpMM) in GCNs that employs block-level partitioning and combined warp strategy. This approach augments performance and multi-level memory efficiency and optimizes memory bandwidth by exploiting memory coalescing and alignment. Evaluation of Accel-GCN across 18 benchmark graphs reveals that it outperforms cuSPARSE, GNNAdvisor, and graph-BLAST by factors of 1.17 times, 1.86 times, and 2.94 times respectively. The results underscore Accel-GCN as an effective solution for enhancing GCN computational efficiency.
翻译:图卷积网络(GCNs)在从跨领域图数据中提取潜在信息方面具有关键作用,但其在主流GPU上的加速面临工作负载不均和内存访问不规则性的挑战。为解决这些问题,我们提出了Accel-GCN,一种面向GCN的GPU加速器架构。Accel-GCN的设计包括:(i) 一种轻量级度排序阶段,用于将具有相似度的节点分组;(ii) 一种块级分区策略,动态调整线程束工作负载大小,与GNNAdvisor等设计相比,增强了共享内存局部性和工作负载均衡,并减少了元数据开销;(iii) 一种组合线程束策略,优化了密集矩阵列维度上的内存合并与计算并行性。基于这些原理,我们为GCN中的稀疏矩阵乘法(SpMM)设计了一个采用块级分区和组合线程束策略的内核。该方法提升了性能和多级内存效率,并通过利用内存合并与对齐优化了内存带宽。在18个基准图上对Accel-GCN的评估显示,其性能分别达到cuSPARSE、GNNAdvisor和graph-BLAST的1.17倍、1.86倍和2.94倍。这些结果凸显了Accel-GCN作为提升GCN计算效率的有效解决方案。