Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance. Given the gradual increase in model size and the wide range of model applications, selectively executing model components for different inputs to improve the inference efficiency is of great interest. In this paper, we propose a dynamic layer-skipping method that leverages the CTC blank output from intermediate layers to trigger the skipping of the last few encoder layers for frames with high blank probabilities. Furthermore, we factorize the CTC output distribution and perform knowledge distillation on intermediate layers to reduce computation and improve recognition accuracy. Experimental results show that by utilizing the CTC blank, the encoder layer depth can be adjusted dynamically, resulting in 29% acceleration of the CTC model inference with minor performance degradation.
翻译:在计算资源受限的情况下部署端到端语音识别模型仍具有挑战性,尽管其性能表现优异。鉴于模型规模逐渐增大且应用场景广泛,针对不同输入有选择性地执行模型组件以提升推理效率具有重要研究价值。本文提出一种动态层跳过方法,利用中间层输出的CTC空白符触发对高空白概率帧的最后几个编码器层进行跳过。此外,我们通过分解CTC输出分布并对中间层进行知识蒸馏,以降低计算量并提升识别准确率。实验结果表明,利用CTC空白符可动态调整编码器层深度,在性能轻微下降的前提下使CTC模型推理速度提升29%。