The possibility of dynamically modifying the computational load of neural models at inference time is crucial for on-device processing, where computational power is limited and time-varying. Established approaches for neural model compression exist, but they provide architecturally static models. In this paper, we investigate the use of early-exit architectures, that rely on intermediate exit branches, applied to large-vocabulary speech recognition. This allows for the development of dynamic models that adjust their computational cost to the available resources and recognition performance. Unlike previous works, besides using pre-trained backbones we also train the model from scratch with an early-exit architecture. Experiments on public datasets show that early-exit architectures from scratch not only preserve performance levels when using fewer encoder layers, but also improve task accuracy as compared to using single-exit models or using pre-trained models. Additionally, we investigate an exit selection strategy based on posterior probabilities as an alternative to frame-based entropy.
翻译:在推理过程中动态调整神经模型计算负载的可能性,对于计算能力有限且时变的设备端处理至关重要。现有神经模型压缩方法虽然有效,但生成的是架构静态的模型。本文研究将依赖中间退出分支的早期退出架构应用于大词汇量语音识别,从而开发出能根据可用资源和识别性能动态调整计算成本的动态模型。与先前工作不同,我们不仅使用预训练主干网络,还从零开始训练具有早期退出架构的模型。公共数据集实验表明,与单退出模型或预训练模型相比,基于早期退出架构的从头训练模型在使用较少编码器层时不仅能保持性能水平,还能提升任务准确率。此外,我们研究了基于后验概率的退出选择策略,作为帧级别熵的替代方案。