Graph Neural Networks (GNNs) have recently gained attention due to their performance on non-Euclidean data. The use of custom hardware architectures proves particularly beneficial for GNNs due to their irregular memory access patterns, resulting from the sparse structure of graphs. However, existing FPGA accelerators are limited by their double buffering mechanism, which doesn't account for the irregular node distribution in typical graph datasets. To address this, we introduce \textbf{AMPLE} (Accelerated Message Passing Logic Engine), an FPGA accelerator leveraging a new event-driven programming flow. We develop a mixed-arithmetic architecture, enabling GNN inference to be quantized at a node-level granularity. Finally, prefetcher for data and instructions is implemented to optimize off-chip memory access and maximize node parallelism. Evaluation on citation and social media graph datasets ranging from $2$K to $700$K nodes showed a mean speedup of $243\times$ and $7.2\times$ against CPU and GPU counterparts, respectively.
翻译:图神经网络(GNNs)凭借其在非欧几里得数据上的优异性能,近年来受到广泛关注。由于图结构的稀疏性导致不规则的内存访问模式,采用定制化硬件架构对GNN尤为有利。然而,现有FPGA加速器受限于其双缓冲机制,未能有效应对典型图数据集中节点分布的不规则特性。为此,我们提出\textbf{AMPLE}(加速消息传递逻辑引擎),一种采用新型事件驱动编程流程的FPGA加速器。我们开发了混合算术架构,使GNN推理能够在节点级粒度上进行量化。最后,通过实现数据与指令预取器,优化片外内存访问并最大化节点并行度。在包含$2$K至$700$K节点的引文网络与社交媒体图数据集上的评估表明,相较于CPU与GPU基准方案,平均加速比分别达到$243\times$与$7.2\times$。