DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices such as microcontroller units (MCUs). The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML), which explores extending the reach of machine learning to many low-end IoT devices to achieve ubiquitous intelligence. Due to the weak capability of embedded devices, it is necessary to compress models by pruning enough weights before deploying. Although pruning has been studied extensively on many computing platforms, two key issues with pruning methods are exacerbated on MCUs: models need to be deeply compressed without significantly compromising accuracy, and they should perform efficiently after pruning. Current solutions only achieve one of these objectives, but not both. In this paper, we find that pruned models have great potential for efficient deployment and execution on MCUs. Therefore, we propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models. It can be integrated into commercial ML frameworks for practical deployment, and a prototype system has been developed. Extensive experiments on various models show promising gains compared to state-of-the-art methods.
翻译:DTMM是一个专为在微控制器单元(MCU)等弱物联网设备上高效部署和执行机器学习模型而设计的库。设计DTMM的动机源于新兴的微型机器学习(TinyML)领域,该领域旨在将机器学习拓展至众多低端物联网设备,以实现普适智能。由于嵌入式设备能力较弱,在部署前需通过剪枝足够多的权重来压缩模型。尽管剪枝已在多种计算平台上得到广泛研究,但在MCU上,剪枝方法的两大关键问题变得更加突出:模型需在不显著降低精度的前提下深度压缩,且剪枝后应保持高效执行。现有解决方案仅能达成其中一个目标,无法兼顾两者。本文发现,剪枝后的模型在MCU上具有高效部署和执行的巨大潜力。因此,我们提出DTMM,通过剪枝单元选择、执行前剪枝优化、运行时加速以及执行后低成本存储,填补剪枝模型高效部署与执行的空白。它可集成到商业ML框架中实现实际部署,并已开发出原型系统。在多种模型上的大量实验表明,与最先进方法相比,取得了令人瞩目的性能提升。