Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exit architectures, in which additional exit branches are appended to intermediate layers of the encoder. In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands. Previous research on early-exiting ASR models has relied on pre-trained self-supervised models, fine-tuned with an early-exit loss. In this paper, we undertake an experimental comparison between fine-tuning pre-trained backbones and training models from scratch with the early-exiting objective. Experiments conducted on public datasets reveal that early-exit models trained from scratch not only preserve performance when using fewer encoder layers but also exhibit enhanced task accuracy compared to single-exit or pre-trained models. Furthermore, we explore an exit selection strategy grounded in posterior probabilities as an alternative to the conventional frame-based entropy approach. Results provide insights into the training dynamics of early-exit architectures for ASR models, particularly the efficacy of training strategies and exit selection methods.

翻译：在推理过程中动态调整神经模型计算负载的能力，对于计算资源有限且随时间变化的设备端处理场景至关重要。早期退出架构为此提供了有前景的解决方案——该架构在编码器中间层附加额外的退出分支。在自动语音识别（ASR）的自注意力模型中，早期退出架构能够开发出可根据计算资源变化和ASR性能需求调整规模与架构的动态模型。现有关于早期退出ASR模型的研究多依赖于预训练的自监督模型，并通过早期退出损失进行微调。本文对基于预训练主干网络的微调与从头训练结合早期退出目标的方法进行了实验对比。公开数据集上的实验表明：与单退出或预训练模型相比，从头训练的早期退出模型在使用较少编码器层时不仅能保持性能，还能提升任务准确率。此外，我们探索了一种基于后验概率的退出选择策略，作为传统基于帧的熵值方法的替代方案。研究结果揭示了ASR模型中早期退出架构的训练动态特性，特别是训练策略与退出选择方法的效果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日