One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate an Attention-based model in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables an end-to-end 8-bit MobileBERT, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s at 32.5 Inf/s consuming 52.0 mW (0.65 V, 22 nm FD-SOI technology).
翻译:微型机器学习(tinyML)面临的挑战之一是如何跟上机器学习模型从卷积神经网络到Transformer的演进步伐。我们通过利用异构架构模板解决这一问题,该模板将RISC-V处理器与硬件加速器相结合,并辅以自动化部署流程。我们展示了在微型机器学习功耗约束下运行的基于注意力机制的模型,该模型采用八核集群与量化注意力加速器协同工作。我们的部署流程实现了端到端的8位MobileBERT模型,在0.65V电压、22nm FD-SOI工艺下,以52.0mW功耗达到32.5次推理/秒的处理速度,实现了2960 GOp/J的领先能效和154 GOp/s的吞吐量。