Graph neural network (GNN) inference faces significant bottlenecks in preprocessing, which often dominate overall inference latency. We introduce AutoGNN, an FPGA-based accelerator designed to address these challenges by leveraging FPGA's reconfigurability and specialized components. AutoGNN adapts to diverse graph inputs, efficiently performing computationally intensive tasks such as graph conversion and sampling. By utilizing components like adder trees, AutoGNN executes reduction operations in constant time, overcoming the limitations of serialization and synchronization on GPUs. AutoGNN integrates unified processing elements (UPEs) and single-cycle reducers (SCRs) to streamline GNN preprocessing. UPEs enable scalable parallel processing for edge sorting and unique vertex selection, while SCRs efficiently handle sequential tasks such as pointer array construction and subgraph reindexing. A user-level software framework dynamically profiles graph inputs, determines optimal configurations, and reprograms AutoGNN to handle varying workloads. Implemented on a 7$n$m enterprise FPGA, AutoGNN achieves up to 9.0$\times$ and 2.1$\times$ speedup compared to conventional and GPU-accelerated preprocessing systems, respectively, enabling high-performance GNN preprocessing across diverse datasets.
翻译:图神经网络(GNN)推理在预处理阶段面临显著瓶颈,该阶段通常主导整体推理延迟。本文提出AutoGNN,一种基于FPGA的加速器,旨在通过利用FPGA的可重构特性与专用组件应对这些挑战。AutoGNN能够自适应多样化的图输入,高效执行图转换与采样等计算密集型任务。通过采用加法树等组件,AutoGNN可在恒定时间内完成归约操作,克服了GPU上串行化与同步机制的限制。AutoGNN集成了统一处理单元(UPE)与单周期归约器(SCR)以优化GNN预处理流程:UPE支持边排序与唯一顶点选择的可扩展并行处理,而SCR则高效处理指针数组构建与子图重索引等顺序任务。用户级软件框架动态分析图输入特征,确定最优配置并重编程AutoGNN以适应不同工作负载。基于7$n$m企业级FPGA的实现表明,相较于传统预处理系统与GPU加速系统,AutoGNN分别取得最高9.0$\times$与2.1$\times$的加速比,为跨领域数据集的高性能GNN预处理提供了有效解决方案。