Transformer-based models have recently made significant achievements in the application of end-to-end (E2E) automatic speech recognition (ASR). It is possible to deploy the E2E ASR system on smart devices with the help of Transformer-based models. While these models still have the disadvantage of requiring a large number of model parameters. To overcome the drawback of universal Transformer models for the application of ASR on edge devices, we propose a solution that can reuse the block in Transformer models for the occasion of the small footprint ASR system, which meets the objective of accommodating resource limitations without compromising recognition accuracy. Specifically, we design a novel block-reusing strategy for speech Transformer (BRST) to enhance the effectiveness of parameters and propose an adapter module (ADM) that can produce a compact and adaptable model with only a few additional trainable parameters accompanying each reusing block. We conducted an experiment with the proposed method on the public AISHELL-1 corpus, and the results show that the proposed approach achieves the character error rate (CER) of 9.3%/6.63% with only 7.6M/8.3M parameters without and with the ADM, respectively. In addition, we also make a deeper analysis to show the effect of ADM in the general block-reusing method.
翻译:基于Transformer的模型最近在端到端自动语音识别应用中取得了显著成就。借助Transformer模型,可以在智能设备上部署端到端语音识别系统,然而这些模型仍然存在需要大量模型参数的缺点。为解决通用Transformer模型在边缘设备语音识别应用中的这一不足,我们提出了一种可在小型化语音识别系统中重用Transformer模块的方案,该方案在满足资源限制的同时不牺牲识别精度。具体而言,我们为语音Transformer设计了一种新颖的模块重用策略,以增强参数有效性,并提出了一种适配器模块,该模块通过在每个重用模块上仅添加少量可训练参数,即可生成紧凑且可自适应的模型。我们在公开的AISHELL-1语料库上对所提方法进行了实验,结果表明:在未使用和使用适配器模块时,该方法分别以7.6M和8.3M参数实现了9.3%和6.63%的字错误率。此外,我们还进行了深入分析,以阐明适配器模块在通用模块重用方法中的作用。