Machine learning algorithms are being used more frequently in the first-level triggers in collider experiments, with Graph Neural Networks pushing the hardware requirements of FPGA-based triggers beyond the current state of the art. To meet the stringent demands of high-throughput and low-latency environments, we propose a concept for latency-optimized preprocessing of sparse sensor data, enabling efficient GNN hardware acceleration by removing dynamic input sparsity. Our approach rearranges data coming from a large number of First-In-First-Out interfaces, typically sensor frontends, to a smaller number of FIFO interfaces connected to a machine learning hardware accelerator. In order to achieve high throughput while minimizing the hardware utilization, we developed a hierarchical sparsity compression pipeline optimized for FPGAs. We implemented our concept in the Chisel design language as an open-source hardware generator. For demonstration, we implemented one configuration of our module as preprocessing stage in a GNN-based first-level trigger for the Electromagnetic Calorimeter inside the Belle II detector. Additionally we evaluate latency, throughput, resource utilization, and scalability for a wide range of parameters, to enable broader use for other large scale scientific experiments.
翻译:机器学习算法正日益频繁地应用于对撞机实验的一级触发系统中,其中图神经网络(GNN)对基于FPGA的触发器的硬件要求已超越当前技术水平。为满足高吞吐量与低延迟环境的严苛需求,我们提出了一种面向延迟优化的稀疏传感器数据预处理方案,通过消除动态输入稀疏性,实现高效的GNN硬件加速。该方法将来自大量先入先出(FIFO)接口(通常为传感器前端)的数据,重排至连接机器学习硬件加速器的少量FIFO接口。为实现高吞吐量并最小化硬件资源占用,我们开发了专为FPGA优化的层次化稀疏压缩流水线。我们使用Chisel设计语言将本方案实现为开源硬件生成器。为进行演示,我们在Belle II探测器内部电磁量能器的GNN基一级触发系统中,以预处理阶段形式实现了该模块的一种配置。此外,我们评估了广泛参数范围内的延迟、吞吐量、资源利用率和可扩展性,以促进其在其他大规模科学实验中的更广泛应用。