Machine Translation (MT) remains one of the last NLP tasks where large language models (LLMs) have not yet replaced dedicated supervised systems. This work exploits the complementary strengths of LLMs and supervised MT by guiding LLMs to automatically post-edit MT with external feedback on its quality, derived from Multidimensional Quality Metric (MQM) annotations. Working with LLaMA-2 models, we consider prompting strategies varying the nature of feedback provided and then fine-tune the LLM to improve its ability to exploit the provided guidance. Through experiments on Chinese-English, English-German, and English-Russian MQM data, we demonstrate that prompting LLMs to post-edit MT improves TER, BLEU and COMET scores, although the benefits of fine-grained feedback are not clear. Fine-tuning helps integrate fine-grained feedback more effectively and further improves translation quality based on both automatic and human evaluation.
翻译:机器翻译(MT)仍是大型语言模型(LLM)尚未取代专用监督系统的最后几项自然语言处理任务之一。本研究通过引导LLM利用源自多维质量指标(MQM)标注的外部质量反馈,自动对机器翻译结果进行后编辑,从而发挥LLM与监督式MT的互补优势。基于LLaMA-2模型,我们探索了不同反馈类型下的提示策略,并对LLM进行微调以提升其利用所提供指导的能力。在中英、英德和英俄MQM数据上的实验表明,提示LLM进行后编辑可改善TER、BLEU和COMET分数,但细粒度反馈的效益尚不明确。微调能更有效地整合细粒度反馈,并基于自动评估与人工评估双向提升翻译质量。