Simultaneous machine translation (SimulMT) presents a challenging trade-off between translation quality and latency. Recent studies have shown that LLMs can achieve good performance in SimulMT tasks. However, this often comes at the expense of high inference cost and latency. In this paper, we propose a conversational SimulMT framework to enhance the inference efficiency of LLM-based SimulMT through multi-turn-dialogue-based decoding. Our experiments with Llama2-7b-chat on two SimulMT benchmarks demonstrate the superiority of LLM in translation quality while achieving comparable computational latency to specialized SimulMT models.
翻译:同声传译(SimulMT)在翻译质量与延迟之间存在难以权衡的挑战。近期研究表明,大语言模型(LLM)在同声传译任务中能取得良好性能,但这往往以高昂的推理成本和延迟为代价。本文提出一种对话式SimulMT框架,通过基于多轮对话的解码机制提升基于LLM的SimulMT推理效率。我们在两个SimulMT基准测试中使用Llama2-7b-chat进行实验,结果表明大语言模型在保持与专用SimulMT模型相当的计算延迟的同时,展现出更优越的翻译质量。