Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55\% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.
翻译:摘要:基于混合专家(Mixture-of-Experts)的模型通过利用语言专家有效提取语种相关表征,已在语种混淆自动语音识别中广泛应用。然而,由于不同语言发音相似性可能导致多语言建模效率低下和语言边界估计不准确,该领域仍存在显著改进空间。为消除上述缺陷,本文提出跨层语言适配器与边界感知训练方法,即边界感知混合专家(BA-MoE)。具体而言,我们在每个编码器层引入语种特定适配器以分离语种相关表征,并设计统一门控层融合表征;其次,通过计算各语种特定适配器平均输出的语言适配损失,增强适配器模块的语种表征学习能力;此外,采用边界感知预测器学习边界表征以缓解语言边界混淆问题。在ASRU 2019中英混合语种挑战赛数据集上,本方法相较于基线模型将混合错误率降低16.55%,取得显著性能提升。