ASR error correction continues to serve as an important part of post-processing for speech recognition systems. Traditionally, these models are trained with supervised training using the decoding results of the underlying ASR system and the reference text. This approach is computationally intensive and the model needs to be re-trained when switching the underlying ASR model. Recent years have seen the development of large language models and their ability to perform natural language processing tasks in a zero-shot manner. In this paper, we take ChatGPT as an example to examine its ability to perform ASR error correction in the zero-shot or 1-shot settings. We use the ASR N-best list as model input and propose unconstrained error correction and N-best constrained error correction methods. Results on a Conformer-Transducer model and the pre-trained Whisper model show that we can largely improve the ASR system performance with error correction using the powerful ChatGPT model.
翻译:ASR错误纠正仍是语音识别系统后处理中的重要环节。传统上,这类模型通过使用底层ASR系统的解码结果与参考文本进行监督训练。这种方法计算成本高昂,当更换底层ASR模型时需要重新训练。近年来,大型语言模型及其在零样本情况下执行自然语言处理任务的能力不断发展。本文以ChatGPT为例,考察其在零样本或单样本设置下执行ASR错误纠正的能力。我们采用ASR N-best列表作为模型输入,提出了无约束错误纠正和N-best约束错误纠正方法。在Conformer-Transducer模型和预训练Whisper模型上的实验结果表明,借助强大的ChatGPT模型进行错误纠正,可以大幅提升ASR系统性能。