One challenge in spoken language translation is that plenty of spoken content is long-form, but short units are necessary for obtaining high-quality translations. To address this mismatch, we adapt large language models (LLM) to split long ASR transcripts into segments that can be independently translated so as to maximize the overall translation quality. To combat the tendency of hallucination by LLMs, we incorporate finite-state constraints during decoding to eliminate invalid outputs. We discover that LLMs are adaptable to transcripts containing ASR errors through prompt-tuning or fine-tuning. In comparison to a state-of-the-art automatic punctuation baseline, our best LLM improves the average BLEU for English-German, English-Spanish, and English-Arabic TED talk translation in 9 test sets by 2.9 points, just by improving segmentation.
翻译:口语翻译面临的一个挑战是大量口语内容为长形式,但获得高质量翻译需要短单元。为解决这一不匹配问题,我们调整大语言模型(LLM)将长自动语音识别(ASR)转录文本分割成可独立翻译的片段,以最大化整体翻译质量。为抑制LLM产生幻觉的倾向,我们在解码过程中引入有限状态约束以消除无效输出。我们发现,通过提示微调或全量微调,LLM能够适应包含ASR错误的转录文本。与最先进的自动标点基线相比,我们最优的LLM仅通过改进分段,就在9个测试集上将英语-德语、英语-西班牙语和英语-阿拉伯语TED演讲翻译的平均BLEU分数提升了2.9个点。