Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer with external language model (LM) fusion for speech recognition. In this work, we show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view. Theoretically, we derive that the global optimum of maximum mutual information (MMI) training shares a similar formula as ILM subtraction. Empirically, we show that ILM subtraction and sequence discriminative training achieve similar performance across a wide range of experiments on Librispeech, including both MMI and minimum Bayes risk (MBR) criteria, as well as neural transducers and LMs of both full and limited context. The benefit of ILM subtraction also becomes much smaller after sequence discriminative training. We also provide an in-depth study to show that sequence discriminative training has a minimal effect on the commonly used zero-encoder ILM estimation, but a joint effect on both encoder and prediction + joint network for posterior probability reshaping including both ILM and blank suppression.
翻译:内部语言模型(ILM)减法已广泛应用于提升结合外部语言模型(LM)融合的RNN-换能器在语音识别中的性能。本研究从理论和实证两个角度证明序列判别训练与ILM减法具有强相关性。理论上,我们推导出最大互信息(MMI)训练的全局最优解与ILM减法具有相似形式。实证方面,我们通过在Librispeech上开展的大范围实验(涵盖MMI和最小贝叶斯风险(MBR)准则,以及全上下文和有限上下文的神经换能器与语言模型)表明,ILM减法与序列判别训练可实现相近性能。序列判别训练后,ILM减法的性能增益也显著降低。我们进一步深入研究发现,序列判别训练对常用的零编码器ILM估计影响甚微,但会联合作用于编码器与预测+联合网络,实现包括ILM抑制和空白抑制在内的后验概率重塑。