Developing deep learning models on tiny devices (e.g. Microcontroller units, MCUs) has attracted much attention in various embedded IoT applications. However, it is challenging to efficiently design and deploy recent advanced models (e.g. transformers) on tiny devices due to their severe hardware resource constraints. In this work, we propose TinyFormer, a framework specifically designed to develop and deploy resource-efficient transformers on MCUs. TinyFormer mainly consists of SuperNAS, SparseNAS and SparseEngine. Separately, SuperNAS aims to search for an appropriate supernet from a vast search space. SparseNAS evaluates the best sparse single-path model including transformer architecture from the identified supernet. Finally, SparseEngine efficiently deploys the searched sparse models onto MCUs. To the best of our knowledge, SparseEngine is the first deployment framework capable of performing inference of sparse models with transformer on MCUs. Evaluation results on the CIFAR-10 dataset demonstrate that TinyFormer can develop efficient transformers with an accuracy of $96.1\%$ while adhering to hardware constraints of $1$MB storage and $320$KB memory. Additionally, TinyFormer achieves significant speedups in sparse inference, up to $12.2\times$, when compared to the CMSIS-NN library. TinyFormer is believed to bring powerful transformers into TinyML scenarios and greatly expand the scope of deep learning applications.
翻译:在微型设备(如微控制器单元,MCU)上开发深度学习模型已引起各种嵌入式物联网应用的广泛关注。然而,由于微型设备严苛的硬件资源限制,高效设计与部署当前先进模型(如Transformer)仍具有挑战性。本文提出TinyFormer——一个专为在MCU上开发与部署资源高效型Transformer而设计的框架。TinyFormer主要由SuperNAS、SparseNAS和SparseEngine三部分组成。其中,SuperNAS旨在从广阔的搜索空间中搜索合适的超网络;SparseNAS则从识别出的超网络中评估最优的稀疏单路径模型(含Transformer架构);最后,SparseEngine将搜索得到的稀疏模型高效部署至MCU。据我们所知,SparseEngine是首个能够在MCU上执行含Transformer的稀疏模型推理的部署框架。在CIFAR-10数据集上的评估结果表明,TinyFormer能够在满足1MB存储空间和320KB内存的硬件约束条件下,开发出精度达96.1%的高效Transformer。此外,与CMSIS-NN库相比,TinyFormer在稀疏推理中实现了高达12.2倍的显著加速。我们相信,TinyFormer能够将强大的Transformer引入TinyML场景,并极大拓展深度学习应用的范围。