ASR model deployment environment is ever-changing, and the incoming speech can be switched across different domains during a session. This brings a challenge for effective domain adaptation when only target domain text data is available, and our objective is to obtain obviously improved performance on the target domain while the performance on the general domain is less undermined. In this paper, we propose an adaptive LM fusion approach called internal language model estimation based adaptive domain adaptation (ILME-ADA). To realize such an ILME-ADA, an interpolated log-likelihood score is calculated based on the maximum of the scores from the internal LM and the external LM (ELM) respectively. We demonstrate the efficacy of the proposed ILME-ADA method with both RNN-T and LAS modeling frameworks employing neural network and n-gram LMs as ELMs respectively on two domain specific (target) test sets. The proposed method can achieve significantly better performance on the target test sets while it gets minimal performance degradation on the general test set, compared with both shallow and ILME-based LM fusion methods.
翻译:ASR模型的部署环境不断变化,同一会话中输入的语音可能在不同领域间切换。这给仅能获取目标领域文本数据时的有效领域适配带来了挑战,我们的目标是使模型在目标领域上的性能显著提升,同时尽量减少在通用领域上的性能下降。本文提出一种自适应语言模型融合方法——基于内部语言模型估计的自适应领域适配(ILME-ADA)。为实现该ILME-ADA方法,基于内部语言模型(ILM)与外部语言模型(ELM)的得分最大值计算插值对数似然得分。我们分别在采用神经网络和n-gram语言模型作为ELM的RNN-T和LAS建模框架上,通过两个特定领域(目标)测试集验证了所提ILME-ADA方法的有效性。与浅层融合和基于ILME的语言模型融合方法相比,所提方法能够在目标测试集上取得显著更优性能,同时在通用测试集上的性能退化最小。