Specialized compute blocks have been developed for efficient DNN execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the interconnect and memory power consumption by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39%. These power improvements are achieved with no loss of accuracy and negligible hardware cost.
翻译:面向边缘人工智能推理引擎的低功耗技术被提出,该技术通过结合零开销编码与神经网络数据及参数的统计分析,解决因大量数据与参数移动导致的互连与片上存储器功耗瓶颈。实验表明,在主流基准测试中,该方法可使互连与存储器功耗降低高达80%,同时计算模块功耗额外节省39%。上述功耗优化在保持零精度损失与极低硬件开销的前提下实现。