Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.
翻译:现代神经网络架构严重依赖于海量的乘累加算术运算,这构成了主要的计算开销。因此,本文提出一种在FPGA上实现的高吞吐量、可扩展且高能效的非逐元素矩阵乘法单元,作为神经网络的基本组件。我们首先对MADDNESS算法(一种基于查找表的近似矩阵乘法)的层间与层内冗余进行了精简,设计出一个快速、高效、可扩展的近似矩阵乘法模块,称为“近似乘法单元”。该单元通过专用的内存管理与访问设计,进一步优化了基于查找表的矩阵乘法,将计算开销与输入分辨率解耦,从而显著提升了基于FPGA的神经网络加速器效率。实验结果表明,在基于FPGA的量化神经网络加速器中,使用我们的近似乘法单元相比现有最优方案,可实现高达9倍的吞吐量提升和112倍的能效提升。