This work introduces a highly efficient implementation of the transformer architecture on a Field-Programmable Gate Array (FPGA) by using the \texttt{hls4ml} tool. Given the demonstrated effectiveness of transformer models in addressing a wide range of problems, their application in experimental triggers within particle physics becomes a subject of significant interest. In this work, we have implemented critical components of a transformer model, such as multi-head attention and softmax layers. To evaluate the effectiveness of our implementation, we have focused on a particle physics jet flavor tagging problem, employing a public dataset. We recorded latency under 2 $\mu$s on the Xilinx UltraScale+ FPGA, which is compatible with hardware trigger requirements at the CERN Large Hadron Collider experiments.
翻译:本工作利用\texttt{hls4ml}工具,在现场可编程门阵列(FPGA)上实现了Transformer架构的高效部署。鉴于Transformer模型在解决各类问题中已展现出的有效性,其在粒子物理实验触发系统中的应用成为极具研究价值的课题。我们实现了Transformer模型的关键组件,包括多头注意力机制和softmax层。为评估实现效果,我们聚焦于粒子物理喷注味标记问题,采用公开数据集进行验证。在Xilinx UltraScale+ FPGA上测得的延迟低于2微秒,满足CERN大型强子对撞机实验中硬件触发系统的要求。