The deployment of Vision Transformers (ViTs) on hardware platforms, specially Field-Programmable Gate Arrays (FPGAs), presents many challenges, which are mainly due to the substantial computational and power requirements of their non-linear functions, notably layer normalization, softmax, and Gaussian Error Linear Unit (GELU). These critical functions pose significant obstacles to efficient hardware implementation due to their complex mathematical operations and the inherent resource count and architectural limitations of FPGAs. PEANO-ViT offers a novel approach to streamlining the implementation of the layer normalization layer by introducing a division-free technique that simultaneously approximates the division and square root function. Additionally, PEANO-ViT provides a multi-scale division strategy to eliminate division operations in the softmax layer, aided by a Pade-based approximation for the exponential function. Finally, PEANO-ViT introduces a piece-wise linear approximation for the GELU function, carefully designed to bypass the computationally intensive operations associated with GELU. In our comprehensive evaluations, PEANO-ViT exhibits minimal accuracy degradation (<= 0.5% for DeiT-B) while significantly enhancing power efficiency, achieving improvements of 1.91x, 1.39x, 8.01x for layer normalization, softmax, and GELU, respectively. This improvement is achieved through substantial reductions in DSP, LUT, and register counts for these non-linear operations. Consequently, PEANO-ViT enables efficient deployment of Vision Transformers on resource- and power-constrained FPGA platforms.
翻译:在硬件平台(特别是现场可编程门阵列(FPGA))上部署视觉Transformer(ViT)面临诸多挑战,这主要源于其非线性函数(尤其是层归一化、Softmax和高斯误差线性单元(GELU))巨大的计算量与功耗需求。这些关键函数因其复杂的数学运算,以及FPGA固有的资源数量与架构限制,对高效的硬件实现构成了显著障碍。PEANO-ViT提出了一种新颖方案来简化层归一化层的实现:通过引入一种免除法技术,同时近似处理除法与平方根函数。此外,PEANO-ViT采用多尺度除法策略以消除Softmax层中的除法运算,并辅以基于帕德近似的指数函数逼近方法。最后,PEANO-ViT为GELU函数设计了分段线性近似,该方法经过精心设计以规避GELU相关的计算密集型操作。在全面评估中,PEANO-ViT在保持极低的精度损失(DeiT-B模型精度下降≤0.5%)的同时,显著提升了能效:对层归一化、Softmax和GELU分别实现了1.91倍、1.39倍和8.01倍的能效改进。这一改进是通过大幅减少这些非线性运算所需的数字信号处理器(DSP)、查找表(LUT)和寄存器数量实现的。因此,PEANO-ViT使得在资源和功耗受限的FPGA平台上高效部署视觉Transformer成为可能。