To address the issue of poor generalization ability in end-to-end speech recognition models within deep learning, this study proposes a new Conformer-based speech recognition model called "Conformer-R" that incorporates the R-drop structure. This model combines the Conformer model, which has shown promising results in speech recognition, with the R-drop structure. By doing so, the model is able to effectively model both local and global speech information while also reducing overfitting through the use of the R-drop structure. This enhances the model's ability to generalize and improves overall recognition efficiency. The model was first pre-trained on the Aishell1 and Wenetspeech datasets for general domain adaptation, and subsequently fine-tuned on computer-related audio data. Comparison tests with classic models such as LAS and Wenet were performed on the same test set, demonstrating the Conformer-R model's ability to effectively improve generalization.
翻译:针对深度学习框架下端到端语音识别模型泛化能力不足的问题,本研究提出一种融合R-drop结构的新型Conformer语音识别模型——"Conformer-R"。该模型将已在语音识别领域取得显著成效的Conformer模型与R-drop结构有机结合,既能有效建模语音信号的局部与全局信息,又可通过R-drop结构降低过拟合风险,从而增强模型泛化能力并提升整体识别效率。模型首先在Aishell1和Wenetspeech数据集上进行预训练以实现通用领域适配,随后针对计算机相关音频数据进行微调。与LAS、Wenet等经典模型在同一测试集上的对比实验表明,Conformer-R模型能有效提升泛化性能。