With the hardware offloading of network functions, network interface cards (NICs) undertake massive stateful, high-precision, and high-throughput tasks, where timers serve as a critical enabling component. However, existing timer management schemes suffer from heavy software load, low precision, lack of hardware update support, and overflow. This paper proposes two novel operations for priority queues--update and group sorting--to enable hardware timer management. To the best of our knowledge, this work presents the first hardware priority queue to support an update operation through the composition and propagation of basic operations to modify the priorities of elements within the queue. The group sorting mechanism ensures correct timing behavior post-overflow by establishing a group boundary priority to alter the sorting process and element insertion positions. Implemented with a hybrid architecture of a one-dimension (1D) systolic array and shift registers, our design is validated through packet-level simulations for flow table timeout management. Results demonstrate that a 4K-depth, 16-bit timer queue achieves over 500 MHz (175 Mpps, 12 ns precision) in a 28nm process and over 300 MHz (116 Mpps) on an FPGA. Critically, it reduces LUTs and FFs usage by 31% and 25%, respectively, compared to existing designs.
翻译:随着网络功能的硬件卸载,网络接口卡(NIC)承担了大量有状态、高精度、高吞吐量的任务,其中定时器作为关键的使能组件。然而,现有的定时器管理方案存在软件负载重、精度低、缺乏硬件更新支持以及溢出等问题。本文提出了优先级队列的两种新颖操作——更新和分组排序——以实现硬件定时器管理。据我们所知,这项研究首次提出了一种通过基本操作的组合与传播来修改队列内元素优先级,从而支持更新操作的硬件优先级队列。分组排序机制通过建立组边界优先级来改变排序过程和元素插入位置,确保了溢出后定时行为的正确性。我们的设计采用一维脉动阵列和移位寄存器的混合架构实现,并通过数据包级仿真在流表超时管理场景中进行了验证。结果表明,一个深度为4K、位宽为16位的定时器队列在28nm工艺下可实现超过500 MHz(175 Mpps,12 ns精度)的工作频率,在FPGA上可实现超过300 MHz(116 Mpps)的频率。关键的是,与现有设计相比,它分别将LUT和FF的使用量减少了31%和25%。