We explore the ability of large language models (LLMs) to act as ASR post-processors that perform rescoring and error correction. Our focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task-activating prompting (TAP) method that combines instruction and demonstration. Using a pre-trained first-pass system and rescoring output on two out-of-domain tasks (ATIS and WSJ), we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs. By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.
翻译:我们探究了大语言模型作为自动语音识别后处理器的能力,可执行重打分与纠错任务。研究重点在于通过指令提示让大语言模型无需微调即可完成这些任务,为此我们评估了不同提示方案:零样本与小样本上下文学习,以及结合指令与演示的新型任务激活提示方法。采用预训练的第一遍解码系统与重打分输出,在两个域外任务(ATIS与WSJ)上,仅凭冻结大语言模型的上下文学习进行重打分,其结果即可与域调语言模型的重打分性能相抗衡。通过将提示技术与微调相结合,我们实现了低于N最佳候选集基准水平的错误率,展现了大语言模型的泛化能力。