For FPGA-based neural network accelerators, digital signal processing (DSP) blocks have traditionally been the cornerstone for handling multiplications. This paper introduces LUTMUL, which harnesses the potential of look-up tables (LUTs) for performing multiplications. The availability of LUTs typically outnumbers that of DSPs by a factor of 100, offering a significant computational advantage. By exploiting this advantage of LUTs, our method demonstrates a potential boost in the performance of FPGA-based neural network accelerators with a reconfigurable dataflow architecture. Our approach challenges the conventional peak performance on DSP-based accelerators and sets a new benchmark for efficient neural network inference on FPGAs. Experimental results demonstrate that our design achieves the best inference speed among all FPGA-based accelerators, achieving a throughput of 1627 images per second and maintaining a top-1 accuracy of 70.95% on the ImageNet dataset.
翻译:对于基于FPGA的神经网络加速器,数字信号处理(DSP)模块传统上一直是处理乘法运算的基石。本文提出了LUTMUL,它利用查找表(LUT)执行乘法运算的潜力。LUT的可用数量通常比DSP多出约100倍,提供了显著的计算优势。通过利用LUT的这一优势,我们的方法展示了在具有可重构数据流架构的FPGA神经网络加速器上实现性能提升的潜力。我们的方法挑战了基于DSP加速器的传统峰值性能,并为在FPGA上实现高效的神经网络推理设立了新的基准。实验结果表明,我们的设计在所有基于FPGA的加速器中实现了最佳的推理速度,在ImageNet数据集上达到了每秒1627张图像的吞吐量,并保持了70.95%的Top-1准确率。