Contextualized ASR models have been demonstrated to effectively improve the recognition accuracy of uncommon phrases when a predefined phrase list is available. However, these models often struggle with bilingual settings, which are prevalent in code-switching speech recognition. In this study, we make the initial attempt to address this challenge by introducing a Cross-lingual Contextual Biasing(XCB) module. Specifically, we augment a pre-trained ASR model for the dominant language by integrating an auxiliary language biasing module and a supplementary language-specific loss, aimed at enhancing the recognition of phrases in the secondary language. Experimental results conducted on our in-house code-switching dataset have validated the efficacy of our approach, demonstrating significant improvements in the recognition of biasing phrases in the secondary language, even without any additional inference overhead. Additionally, our proposed system exhibits both efficiency and generalization when is applied by the unseen ASRU-2019 test set.
翻译:当存在预定义短语列表时,上下文感知的自动语音识别模型已被证明能有效提升不常见短语的识别准确率。然而,这些模型在处理双语场景时常常面临困难,而双语场景在语码转换语音识别中十分普遍。在本研究中,我们首次尝试通过引入跨语言上下文偏置模块来解决这一挑战。具体而言,我们通过集成一个辅助语言偏置模块和一个补充的特定语言损失函数,对一种主导语言的预训练ASR模型进行增强,旨在提升对次要语言中短语的识别能力。在我们内部构建的语码转换数据集上进行的实验结果验证了该方法的有效性,表明即使在未引入任何额外推理开销的情况下,对次要语言中偏置短语的识别也取得了显著提升。此外,当应用于未见过的ASRU-2019测试集时,我们提出的系统同时展现出高效性和良好的泛化能力。