Transformers have improved drastically the performance of natural language processing (NLP) and computer vision applications. The computation of transformers involves matrix multiplications and non-linear activation functions such as softmax and GELU (Gaussion Error Linear Unit) that are accelerated directly in hardware. Currently, function evaluation is done separately for each function and rarely allows for hardware reuse. To mitigate this problem, in this work, we map the computation of GELU to a softmax operator. In this way, the efficient hardware units designed already for softmax can be reused for computing GELU as well. Computation of GELU can enjoy the inherent vectorized nature of softmax and produce in parallel multiple GELU outcomes. Experimental results show that computing GELU via a pre-existing and incrementally modified softmax hardware unit (a) does not reduce the accuracy of representative NLP applications and (b) allows the reduction of the overall hardware area and power by 6.1% and 11.9%, respectively, on average.
翻译:Transformer大幅提升了自然语言处理(NLP)和计算机视觉应用的性能。其计算过程涉及矩阵乘法以及Softmax和GELU(高斯误差线性单元)等非线性激活函数,这些函数均通过硬件直接加速。目前,各函数的计算独立进行,几乎无法实现硬件复用。为解决此问题,本研究将GELU的计算映射至Softmax算子。由此,已为Softmax设计的高效硬件单元亦可复用于GELU的计算。GELU计算可继承Softmax固有的向量化特性,并行生成多个GELU结果。实验表明,通过预存且经增量修改的Softmax硬件单元计算GELU:(a)不会降低代表性NLP应用的精度;(b)平均可使整体硬件面积和功耗分别降低6.1%和11.9%。