Graph Convolutional Networks (GCNs) are widely adopted for tasks involving relational or graph-structured data and can be formulated as two-stage sparse-dense matrix multiplication (SpMM) during inference. However, existing accelerators often struggle with the irregular workloads induced by power-law node degree distributions. In this work, we propose FlexVector, a vector-processor-based architecture that efficiently accelerates SpMM for GCN inference. To address irregular computation patterns, FlexVector adopts a row-wise, product-based dataflow that regularizes SpMM execution and exposes vector parallelism through full-row access to vector registers, eliminating the need for multi-banked register file designs. Building on this dataflow, it introduces software-managed, flexible vector register files (VRFs) that adapt to irregular data access patterns, without sacrificing memory access efficiency. To further exploit these architectural capabilities, we develop a graph-aware preprocessing and node partitioning strategy that restructures irregular graph workloads to better match the row-wise dataflow and VRF capacity. This hardware-software co-design reduces memory traffic, leading to significant performance and energy efficiency gains on real-world GCN workloads. Experimental results on five real-world GCN datasets show that the VRF-centric FlexVector achieves a 3.78x speedup and 40.5% lower energy at comparable area cost relative to a state-of-the-art cache-centric baseline with buffers of the same size.
翻译:图卷积网络(GCN)被广泛应用于涉及关系型或图结构数据的任务,在推理过程中可形式化为两阶段稀疏-稠密矩阵乘法(SpMM)。然而,现有加速器通常难以应对幂律节点度分布带来的不规则工作负载。本文提出FlexVector——一种基于向量处理器的架构,可高效加速GCN推理中的SpMM计算。为处理不规则计算模式,FlexVector采用面向行、基于乘积的数据流,该数据流通过全行访问向量寄存器以规范化SpMM执行并暴露向量级并行性,从而消除对多存储体寄存器文件设计的需求。基于此数据流,它引入软件可管理的灵活向量寄存器文件(VRF),在保持访存效率的前提下适应不规则数据访问模式。为充分利用这些架构能力,我们开发了一种图感知预处理与节点分区策略,通过重构不规则图工作负载来适配行式数据流与VRF容量。这种软硬件协同设计可减少内存流量,在真实GCN工作负载上实现显著的性能提升与能效增益。在五个真实GCN数据集上的实验结果表明:相较于采用相同大小缓冲区的先进以缓存为中心的基准架构,以VRF为核心的FlexVector在相当面积成本下实现了3.78倍加速比与40.5%的能耗降低。