Deep neural networks (DNNs) have been successfully applied in various fields. A major challenge of deploying DNNs, especially on edge devices, is power consumption, due to the large number of multiply-and-accumulate (MAC) operations. To address this challenge, we propose PowerPruning, a novel method to reduce power consumption in digital neural network accelerators by selecting weights that lead to less power consumption in MAC operations. In addition, the timing characteristics of the selected weights together with all activation transitions are evaluated. The weights and activations that lead to small delays are further selected. Consequently, the maximum delay of the sensitized circuit paths in the MAC units is reduced even without modifying MAC units, which thus allows a flexible scaling of supply voltage to reduce power consumption further. Together with retraining, the proposed method can reduce power consumption of DNNs on hardware by up to 78.3% with only a slight accuracy loss.
翻译:深度神经网络(DNNs)已成功应用于多个领域。然而,由于大量乘累加(MAC)操作的存在,在边缘设备等场景中部署DNNs面临的主要挑战之一是功耗问题。为解决这一挑战,我们提出PowerPruning方法——通过选择能降低MAC操作功耗的权重,实现对数字神经网络加速器功耗的有效削减。此外,我们对所选权重的时序特性及所有激活值跳变过程进行了综合评估,进一步筛选出引发较小延迟的权重与激活值。该方法可在无需改造MAC单元的前提下,降低MAC单元中敏感电路路径的最大延迟,从而允许通过灵活调节供电电压进一步降低功耗。结合再训练技术,本方法可在仅产生轻微精度损失的情况下,将DNNs硬件功耗降低高达78.3%。