Deep learning (DL) is characterised by its dynamic nature, with new deep neural network (DNN) architectures and approaches emerging every few years, driving the field's advancement. At the same time, the ever-increasing use of mobile devices (MDs) has resulted in a surge of DNN-based mobile applications. Although traditional architectures, like CNNs and RNNs, have been successfully integrated into MDs, this is not the case for Transformers, a relatively new model family that has achieved new levels of accuracy across AI tasks, but poses significant computational challenges. In this work, we aim to make steps towards bridging this gap by examining the current state of Transformers' on-device execution. To this end, we construct a benchmark of representative models and thoroughly evaluate their performance across MDs with different computational capabilities. Our experimental results show that Transformers are not accelerator-friendly and indicate the need for software and hardware optimisations to achieve efficient deployment.
翻译:深度学习(DL)以其动态特性著称,每隔几年就会出现新的深度神经网络(DNN)架构和方法,推动该领域的发展。与此同时,移动设备(MD)使用率的持续增长,导致了基于DNN的移动应用激增。尽管CNN和RNN等传统架构已成功集成到移动设备中,但Transformer这一相对较新的模型家族尚未实现这一点。Transformer在各类人工智能任务中达到了新的准确率水平,却带来了显著的计算挑战。在这项工作中,我们旨在通过审视Transformer在设备端执行的现状,为弥合这一差距迈出步伐。为此,我们构建了一个代表性模型基准测试,并全面评估了它们在具有不同计算能力的移动设备上的性能。我们的实验结果表明,Transformer并不具备加速器友好性,并表明需要软件和硬件优化才能实现高效部署。