Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as natural language processing (NLP) and vision applications, especially at the edge. However, attention layers are difficult to map onto existing neuro accelerators since they have a much higher density of non-linear operations, which lead to inefficient utilization of today's vector units. This work introduces NOVA, a NoC-based Vector Unit that can perform non-linear operations within the NoC of the accelerators, and can be overlaid onto existing neuro accelerators to map attention layers at the edge. Our results show that the NOVA architecture is up to 37.8x more power-efficient than state-of-the-art hardware approximators when running existing attention-based neural networks.
翻译:注意力机制正变得日益流行,被广泛应用于多个领域的神经网络模型中,例如自然语言处理和视觉应用,尤其是在边缘计算场景中。然而,由于注意力层包含远高密度的非线性运算,导致现有神经网络加速器中的向量单元利用率低下,故而很难将其映射到这类加速器上。本文提出了NOVA——一种基于片上网络的向量单元,能够在加速器的NOC内执行非线性运算,并可叠加于现有神经加速器之上,用于在边缘侧映射注意力层。实验结果表明,在运行现有基于注意力的神经网络时,NOVA架构的能效比最先进的硬件近似器高出37.8倍。