One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s (0.65 V, 22 nm FD-SOI technology).
翻译:Tiny机器学习(tinyML)面临的挑战之一是如何跟上机器学习模型从卷积神经网络到Transformer的演进。我们通过利用异构架构模板来解决这一问题,该模板将RISC-V处理器与硬件加速器相结合,并辅以自动化部署流程。我们在tinyML功耗约束下展示了基于注意力的模型,采用八核集群与量化注意力加速器耦合的方案。我们的部署流程实现了端到端的8位Transformer推理,在0.65V、22nm FD-SOI工艺下达到了领先的能效与吞吐量——2960 GOp/J和154 GOp/s。