In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared Residual Multi-Head Attention (SRMHA) and Chunk-Level Feedforward Networks (CFFN). The SRMHA module effectively reduces redundant computations in the network, while the CFFN module captures spatial knowledge and reduces the number of parameters. The effectiveness of the EfficientASR model is validated on two public datasets, namely Aishell-1 and HKUST. Experimental results demonstrate a 36% reduction in parameters compared to the baseline Transformer network, along with improvements of 0.3% and 0.2% in Character Error Rate (CER) on the Aishell-1 and HKUST datasets, respectively.
翻译:近年来,Transformer网络在语音识别任务中展现出卓越性能。然而,由于对计算和存储资源的高需求,其部署面临挑战。为解决该问题,本文提出一种名为EfficientASR的轻量化模型,旨在增强Transformer模型的通用性。EfficientASR采用两个核心模块:共享残差多头注意力(SRMHA)与分块前馈网络(CFFN)。SRMHA模块有效减少网络中的冗余计算,而CFFN模块则能捕获空间知识并降低参数量。通过在Aishell-1和HKUST两个公开数据集上的实验验证,EfficientASR模型的有效性得到证实。实验结果表明,与基线Transformer网络相比,该模型参数量减少36%,同时在Aishell-1和HKUST数据集上的字符错误率(CER)分别降低0.3%和0.2%。