LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics

This work presents a novel reconfigurable architecture for Low Latency Graph Neural Network (LL-GNN) designs for particle detectors, delivering unprecedented low latency performance. Incorporating FPGA-based GNNs into particle detectors presents a unique challenge since it requires sub-microsecond latency to deploy the networks for online event selection with a data rate of hundreds of terabytes per second in the Level-1 triggers at the CERN Large Hadron Collider experiments. This paper proposes a novel outer-product based matrix multiplication approach, which is enhanced by exploiting the structured adjacency matrix and a column-major data layout. Moreover, a fusion step is introduced to further reduce the end-to-end design latency by eliminating unnecessary boundaries. Furthermore, a GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under given latency constraints. To facilitate this, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 9.0 times faster and achieves up to 13.1 times higher power efficiency than a GPU implementation. Compared to the previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy. The proposed LL-GNN design advances the next generation of trigger systems by enabling sophisticated algorithms to process experimental data efficiently.

翻译：本文提出了一种用于粒子探测器的低延迟图神经网络（LL-GNN）的可重构架构，实现了前所未有的低延迟性能。将基于FPGA的GNN集成到粒子探测器中面临独特挑战，因为需要在CERN大型强子对撞机实验的一级触发器中，以每秒数百太字节的数据率为在线事件选择部署网络，且要求亚微秒级延迟。本文提出了一种基于外积的矩阵乘法新方法，通过利用结构化邻接矩阵和列优先数据布局进行增强。此外，引入融合步骤以消除不必要的边界，进一步降低端到端设计延迟。进一步地，提出了一种GNN特定的算法-硬件协同设计方法，不仅能找到具有更优延迟的设计，还能在给定延迟约束下找到高精度设计。为此，我们设计并开源了低延迟GNN硬件架构的可定制模板，使得使用高级综合工具能够生成具有高效资源利用率的低延迟FPGA设计。评估结果表明，与GPU实现相比，我们的FPGA实现速度最高提升9.0倍，能效最高提升13.1倍。与先前的FPGA实现相比，本工作实现了6.51至16.7倍的延迟降低。此外，我们的FPGA设计延迟足够低，使得GNN能够部署在亚微秒级实时对撞机触发系统中，从而受益于精度提升。所提出的LL-GNN设计通过使复杂算法高效处理实验数据，推动了下一代触发系统的发展。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日