Large Deep Neural Networks (DNNs) are the backbone of today's artificial intelligence due to their ability to make accurate predictions when being trained on huge datasets. With advancing technologies, such as the Internet of Things, interpreting large quantities of data generated by sensors is becoming an increasingly important task. However, in many applications not only the predictive performance but also the energy consumption of deep learning models is of major interest. This paper investigates the efficient deployment of deep learning models on resource-constrained microcontroller architectures via network compression. We present a methodology for the systematic exploration of different DNN pruning, quantization, and deployment strategies, targeting different ARM Cortex-M based low-power systems. The exploration allows to analyze trade-offs between key metrics such as accuracy, memory consumption, execution time, and power consumption. We discuss experimental results on three different DNN architectures and show that we can compress them to below 10\% of their original parameter count before their predictive quality decreases. This also allows us to deploy and evaluate them on Cortex-M based microcontrollers.
翻译:大型深度神经网络(DNNs)凭借其在海量数据集上训练时做出准确预测的能力,已成为当今人工智能的重要支柱。随着物联网等先进技术的发展,解读传感器生成的海量数据正成为日益重要的任务。然而,在许多应用中,不仅深度学习的预测性能,其能耗也备受关注。本文研究了通过网络压缩在资源受限的微控制器架构上高效部署深度学习模型的方法。我们提出了一种系统化探索不同DNN剪枝、量化和部署策略的方法论,针对基于ARM Cortex-M的低功耗系统。该探索能够分析准确率、内存消耗、执行时间和功耗等关键指标之间的权衡关系。我们讨论了三种不同DNN架构的实验结果,并表明在预测质量下降之前,可将这些模型压缩至其原始参数量的10%以下。这使我们能够将其部署到基于Cortex-M的微控制器上并进行评估。