Over the last years the rapid growth Machine Learning (ML) inference applications deployed on the Edge is rapidly increasing. Recent Internet of Things (IoT) devices and microcontrollers (MCUs), become more and more mainstream in everyday activities. In this work we focus on the family of STM32 MCUs. We propose a novel methodology for CNN deployment on the STM32 family, focusing on power optimization through effective clocking exploration and configuration and decoupled access-execute convolution kernel execution. Our approach is enhanced with optimization of the power consumption through Dynamic Voltage and Frequency Scaling (DVFS) under various latency constraints, composing an NP-complete optimization problem. We compare our approach against the state-of-the-art TinyEngine inference engine, as well as TinyEngine coupled with power-saving modes of the STM32 MCUs, indicating that we can achieve up to 25.2% less energy consumption for varying QoS levels.
翻译:近年来,部署在边缘设备上的机器学习推理应用迅速增长。最新的物联网设备和微控制器在日常活动中变得越来越主流。本研究聚焦于STM32微控制器系列,提出了一种在该系列上部署卷积神经网络的新方法,重点通过有效的时钟探索与配置以及解耦访问-执行卷积核执行机制实现功耗优化。我们的方法通过在不同延迟约束下采用动态电压频率调节技术进一步优化功耗,这构成了一个NP完全优化问题。我们将所提方法与最先进的TinyEngine推理引擎、以及结合STM32微控制器省电模式的TinyEngine进行对比,结果表明在不同服务质量等级下,我们的方法可实现高达25.2%的能耗降低。