FPGAs have distinct advantages as a technology for deploying deep neural networks (DNNs) at the edge. Lookup Table (LUT) based networks, where neurons are directly modelled using LUTs, help maximize this promise of offering ultra-low latency and high area efficiency on FPGAs. Unfortunately, LUT resource usage scales exponentially with the number of inputs to the LUT, restricting PolyLUT to small LUT sizes. This work introduces PolyLUT-Add, a technique that enhances neuron connectivity by combining $A$ PolyLUT sub-neurons via addition to improve accuracy. Moreover, we describe a novel architecture to improve its scalability. We evaluated our implementation over the MNIST, Jet Substructure classification and Network Intrusion Detection benchmark and found that for similar accuracy, PolyLUT-Add achieves a LUT reduction of $1.3-7.7\times$ with a $1.2-2.2\times$ decrease in latency.
翻译:现场可编程门阵列(FPGA)作为一种在边缘部署深度神经网络的技术具有显著优势。基于查找表(LUT)的网络,即直接使用LUT对神经元进行建模,有助于最大限度地实现FPGA上超低延迟和高面积效率的潜力。然而,LUT资源使用量随LUT输入数量的增加呈指数级增长,这限制了PolyLUT只能采用较小的LUT尺寸。本文提出了PolyLUT-Add技术,该技术通过将$A$个PolyLUT子神经元通过加法组合来增强神经元连接性,从而提高精度。此外,我们描述了一种新颖的架构以提升其可扩展性。我们在MNIST、Jet Substructure分类和网络入侵检测基准测试上评估了我们的实现,发现对于相近的精度,PolyLUT-Add实现了$1.3-7.7\times$的LUT资源减少,同时延迟降低了$1.2-2.2\times$。