We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction. Our first focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method that combines causal instructions and demonstration to increase its context windows. Next, we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs, using a pretrained first-pass recognition system and rescoring output on two out-of-domain tasks (ATIS and WSJ). By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.
翻译:我们探索了大型语言模型作为语音识别后处理器进行重新评分和纠错的能力。首先重点关注指令提示方法,使LLM无需微调即可执行这些任务,为此我们评估了不同的提示方案,包括零样本和少样本上下文学习,以及一种新颖的任务激活提示方法,该方法结合因果指令与演示以扩展其上下文窗口。接着我们证明,使用冻结LLM仅通过上下文学习进行重新评分,在预训练的第一遍识别系统与两个域外任务(ATIS和WSJ)的重新评分输出中,可获得与领域调优语言模型相竞争的结果。通过将提示技术与微调相结合,我们实现了低于N最佳预言机水平的错误率,展示了LLM的泛化能力。