Deep neural networks (DNNs) have been successfully applied in various fields. A major challenge of deploying DNNs, especially on edge devices, is power consumption, due to the large number of multiply-and-accumulate (MAC) operations. To address this challenge, we propose PowerPruning, a novel method to reduce power consumption in digital neural network accelerators by selecting weights that lead to less power consumption in MAC operations. In addition, the timing characteristics of the selected weights together with all activation transitions are evaluated. The weights and activations that lead to small delays are further selected. Consequently, the maximum delay of the sensitized circuit paths in the MAC units is reduced even without modifying MAC units, which thus allows a flexible scaling of supply voltage to reduce power consumption further. Together with retraining, the proposed method can reduce power consumption of DNNs on hardware by up to 78.3% with only a slight accuracy loss.
翻译:深度神经网络已在诸多领域取得成功应用。在网络部署(尤其在边缘设备上)过程中,由于需要执行大量乘加运算,功耗问题成为主要挑战。为此,我们提出PowerPruning方法——通过选取能降低乘加运算功耗的权重,实现数字神经网络加速器功耗的有效削减。该方法进一步评估所选权重的时序特性与所有激活状态跳变过程,筛选出引发较小延迟的权重与激活值。这使得即便不修改乘加单元结构,也能缩短其内部敏化电路路径的最大延迟,从而可通过灵活调节供电电压进一步降低功耗。结合重训练策略,本方法在仅牺牲极小精度的条件下,最高可使深度神经网络在硬件上的功耗降低78.3%。