We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based processing technique called DSLOT-NN with aim to accelerate inference of the convolution operation in the deep neural networks (DNNs). The proposed work has the ability to assess and terminate the ineffective convolutions which results in massive power and energy savings. The processing engine is comprised of low-latency most-significant-digit-first (MSDF) (also called online) multipliers and adders that processes data from left-to-right, allowing the execution of subsequent operations in digit-pipelined manner. Use of online operators eliminates the need for the development of complex mechanism of identifying the negative activation, as the output with highest weight value is generated first, and the sign of the result can be identified as soon as first non-zero digit is generated. The precision of the online operators can be tuned at run-time, making them extremely useful in situations where accuracy can be compromised for power and energy savings. The proposed design has been implemented on Xilinx Virtex-7 FPGA and is compared with state-of-the-art Stripes on various performance metrics. The results show the proposed design presents power savings, has shorter cycle time, and approximately 50% higher OPS per watt.
翻译:我们提出了一种基于数字串行从左到右(DSLOT)算术的处理技术,称为DSLOT-NN,旨在加速深度神经网络(DNN)中卷积运算的推理过程。所提出的方法能够评估并终止无效的卷积运算,从而大幅节省功耗和能量。处理引擎由低延迟的最高有效位优先(MSDF)(也称在线)乘法器和加法器组成,这些运算器从左到右处理数据,使得后续操作能够以数字流水线方式执行。使用在线运算器消除了开发复杂机制来识别负激活值的需求,因为最先产生的是具有最高权重值的输出,并且一旦生成第一个非零数字,即可确定结果的符号。在线运算器的精度可在运行时调整,使其在可牺牲精度以换取功耗和能量节省的场景中极具价值。所提出的设计已在Xilinx Virtex-7 FPGA上实现,并在多种性能指标上与当前最先进的Stripes方法进行了比较。结果表明,所提出的设计实现了功耗节省、更短的周期时间,以及每瓦性能(OPS per watt)提升约50%。