Language models (LMs) have been commonly adopted to boost the performance of automatic speech recognition (ASR) particularly in domain adaptation tasks. Conventional way of LM training treats all the words in corpora equally, resulting in suboptimal improvements in ASR performance. In this work, we introduce a novel correction focused LM training approach which aims to prioritize ASR fallible words. The word-level ASR fallibility score, representing the likelihood of ASR mis-recognition, is defined and shaped as a prior word distribution to guide the LM training. To enable correction focused training with text-only corpora, large language models (LLMs) are employed as fallibility score predictors and text generators through multi-task fine-tuning. Experimental results for domain adaptation tasks demonstrate the effectiveness of our proposed method. Compared with conventional LMs, correction focused training achieves up to relatively 5.5% word error rate (WER) reduction in sufficient text scenarios. In insufficient text scenarios, LM training with LLM-generated text achieves up to relatively 13% WER reduction, while correction focused training further obtains up to relatively 6% WER reduction.
翻译:语言模型(LM)常被用于提升自动语音识别(ASR)的性能,尤其是在领域自适应任务中。传统语言模型训练方法平等对待语料中的所有词汇,导致ASR性能提升效果有限。本研究提出了一种新颖的纠错聚焦语言模型训练方法,旨在优先关注ASR易错词汇。我们定义了词级ASR错误率分数,用以表征ASR误识别的可能性,并将其转化为先验词分布以指导LM训练。为利用纯文本语料实现纠错聚焦训练,我们通过多任务微调使大型语言模型(LLM)兼具错误率预测与文本生成能力。领域自适应任务的实验结果表明,该方法具有显著有效性:在充足文本场景下,与传统LM相比,纠错聚焦训练可使词错误率(WER)相对降低5.5%;在文本稀缺场景中,使用LLM生成文本进行LM训练可实现高达13%的相对WER降低,而在此基础上结合纠错聚焦训练可额外获得最高6%的相对WER降低。