We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.
翻译:本研究探讨了在自动语音识别(ASR)假设的网格重评分中,使用大规模先进神经语言模型(NLM)集成的有效性。先前研究已报道了组合少量NLM的效果。与此不同,本研究组合了多达八个NLM,即使用两种不同随机初始化种子训练的前向/后向长短期记忆/Transformer语言模型。我们通过迭代网格生成来组合这些NLM。由于这些NLM彼此互补,通过在每次重评分迭代中逐一组合它们,网格弧上附带的语言得分可逐步优化,从而逐步减少ASR假设中的错误。我们还研究了在长语音(如讲座语音)的网格序列中传递上下文信息(先前重评分结果)的有效性。在使用讲座语音语料库的实验中,通过组合八个NLM并采用上下文传递,我们从ASR 1-best基线获得了24.4%的相对词错误率降低。为作进一步比较,我们使用大规模NLM集成进行了同步(即非迭代)NLM组合和100-best重评分,结果证实了迭代NLM组合进行网格重评分的优势。