Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55\% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.
翻译:基于混合专家模型的方法通过语言专家有效提取语言特定表示,已在语种切换自动语音识别中得到良好应用。然而,由于跨语言相似发音可能导致多语言建模效率低下和语言边界估计不准确,该领域仍存在显著改进空间。为消除这些缺陷,我们提出跨层语言适配器与边界感知训练方法——即边界感知混合专家(BA-MoE)。具体而言,我们引入语言特定适配器以分离语言特定表示,并在每个编码器层内设置统一门控层进行表示融合;其次,计算各语言特定适配器平均输出的语言适配损失,以增强适配器模块的语言特定表示学习能力;此外,通过边界感知预测器学习边界表征,解决语言边界混淆问题。本方法在ASRU 2019汉英混合语种切换挑战数据集上实现显著性能提升,与基线相比混合错误率降低16.55%。