Recent advances in large language models (LLMs) have stepped forward the development of multilingual speech and machine translation by its reduced representation errors and incorporated external knowledge. However, both translation tasks typically utilize beam search decoding and top-1 hypothesis selection for inference. These techniques struggle to fully exploit the rich information in the diverse N-best hypotheses, making them less optimal for translation tasks that require a single, high-quality output sequence. In this paper, we propose a new generative paradigm for translation tasks, namely "GenTranslate", which builds upon LLMs to generate better results from the diverse translation versions in N-best list. Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result. Furthermore, to support LLM finetuning, we build and release a HypoTranslate dataset that contains over 592K hypotheses-translation pairs in 11 languages. Experiments on various speech and machine translation benchmarks (e.g., FLEURS, CoVoST-2, WMT) demonstrate that our GenTranslate significantly outperforms the state-of-the-art model.
翻译:摘要:近年来,大型语言模型(LLM)通过减少表示误差和融合外部知识,推动了多语言语音及机器翻译领域的发展。然而,这两类翻译任务通常采用束搜索解码和Top-1假设选择进行推理,难以充分挖掘多样化N-best候选假设中的丰富信息,因此在需要单一高质量输出序列的翻译任务中表现欠佳。本文提出一种名为“GenTranslate”的翻译任务生成式新范式,该范式基于LLM从N-best列表中的多样化翻译版本生成更优质的结果。借助LLM丰富的语言知识和强大的推理能力,新范式能够整合N-best候选结果中的多维信息,从而生成更高质量的翻译。此外,为支持LLM微调,我们构建并发布了HypoTranslate数据集,涵盖11种语言的59.2万余对假设-翻译配对。在多个语音与机器翻译基准测试(如FLEURS、CoVoST-2、WMT)上的实验表明,GenTranslate显著优于现有最优模型。