We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based processing technique called DSLOT-NN with aim to accelerate inference of the convolution operation in the deep neural networks (DNNs). The proposed work has the ability to assess and terminate the ineffective convolutions which results in massive power and energy savings. The processing engine is comprised of low-latency most-significant-digit-first (MSDF) (also called online) multipliers and adders that processes data from left-to-right, allowing the execution of subsequent operations in digit-pipelined manner. Use of online operators eliminates the need for the development of complex mechanism of identifying the negative activation, as the output with highest weight value is generated first, and the sign of the result can be identified as soon as first non-zero digit is generated. The precision of the online operators can be tuned at run-time, making them extremely useful in situations where accuracy can be compromised for power and energy savings. The proposed design has been implemented on Xilinx Virtex-7 FPGA and is compared with state-of-the-art Stripes on various performance metrics. The results show the proposed design presents power savings, has shorter cycle time, and approximately 50% higher OPS per watt.
翻译:我们提出一种基于数字串行左到右(DSLOT)算术的处理技术,称为DSLOT-NN,旨在加速深度神经网络(DNN)中卷积运算的推理过程。所提出的方法能够评估并终止无效卷积,从而实现巨大的功耗和能量节省。该处理引擎采用低延迟的最高有效位优先(MSDF,也称为在线)乘法器和加法器,从左到右处理数据,使得后续操作能够以数字流水线方式执行。使用在线运算器消除了开发识别负激活值的复杂机制的需求,因为具有最高权重值的输出首先生成,并且一旦产生第一个非零数字即可识别结果的符号。在线运算器的精度可在运行时调整,使其在精度可牺牲以换取功耗和能量节省的情况下极为有用。所提出的设计已在Xilinx Virtex-7 FPGA上实现,并与当前最先进的Stripes在不同性能指标上进行了比较。结果表明,所提出的设计实现了功耗节省、更短的周期时间,并且每瓦特操作数(OPS)提高了约50%。