GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. However, it can be notoriously challenging to inference GCNs over large graph datasets, limiting their application to large real-world graphs and hindering the exploration of deeper and more sophisticated GCN graphs. This is because real-world graphs can be extremely large and sparse. Furthermore, the node degree of GCNs tends to follow the power-law distribution and therefore have highly irregular adjacency matrices, resulting in prohibitive inefficiencies in both data processing and movement and thus substantially limiting the achievable GCN acceleration efficiency. To this end, this paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity and boost GCNs' inference efficiency. Specifically, on the algorithm level, GCoD integrates a split and conquer GCN training strategy that polarizes the graphs to be either denser or sparser in local neighborhoods without compromising the model accuracy, resulting in graph adjacency matrices that (mostly) have merely two levels of workload and enjoys largely enhanced regularity and thus ease of acceleration. On the hardware level, we further develop a dedicated two-pronged accelerator with a separated engine to process each of the aforementioned denser and sparser workloads, further boosting the overall utilization and acceleration efficiency. Extensive experiments and ablation studies validate that our GCoD consistently reduces the number of off-chip accesses, leading to speedups of 15286x, 294x, 7.8x, and 2.5x as compared to CPUs, GPUs, and prior-art GCN accelerators including HyGCN and AWB-GCN, respectively, while maintaining or even improving the task accuracy. Codes are available at https://github.com/RICE-EIC/GCoD.

翻译：图卷积网络（GCNs）已成为当前最先进的图学习模型。然而，在大规模图数据集上对GCN进行推理极具挑战性，这限制了其在实际大规模图中的应用，并阻碍了对更深层、更复杂GCN图的探索。这是因为现实世界的图可能极其庞大且稀疏。此外，GCN的节点度往往遵循幂律分布，因此其邻接矩阵具有高度不规则性，导致数据处理和传输效率极低，从而严重限制了可实现的GCN加速效率。为此，本文提出了一种名为GCoD的GCN算法与加速器协同设计框架，能够大幅缓解上述GCN不规则性问题并提升GCN推理效率。具体而言，在算法层面，GCoD采用了一种分治式GCN训练策略，该策略在不影响模型精度的前提下将图在局部邻域内极化为更稠密或更稀疏的结构，从而生成（主要）仅具有两种工作负载级别的图邻接矩阵，显著增强了规则性并降低了加速难度。在硬件层面，我们进一步开发了一种专用双分支加速器，通过分离式引擎分别处理上述稠密与稀疏工作负载，从而进一步提升整体利用率和加速效率。大量实验与消融研究证实，我们的GCoD能持续减少片外访问次数，与CPU、GPU以及包括HyGCN和AWB-GCN在内的现有GCN加速器相比，分别实现了15286倍、294倍、7.8倍和2.5倍的加速效果，同时保持甚至提升了任务精度。代码已发布于https://github.com/RICE-EIC/GCoD。