Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing

Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigate using LLM's for MT and explore recent parameter-efficient fine-tuning techniques. Surprisingly, our initial experiments find that fine-tuning for translation purposes even led to performance degradation. To overcome this, we propose an alternative approach: adapting LLM's as Automatic Post-Editors (APE) rather than direct translators. Building on the LLM's exceptional ability to process and generate lengthy sequences, we also propose extending our approach to document-level translation. We show that leveraging Low-Rank-Adapter fine-tuning for APE can yield significant improvements across both sentence and document-level metrics while generalizing to out-of-domain data. Most notably, we achieve a state-of-the-art accuracy rate of 89\% on the ContraPro test set, which specifically assesses the model's ability to resolve pronoun ambiguities when translating from English to German. Lastly, we investigate a practical scenario involving manual post-editing for document-level translation, where reference context is made available. Here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations\footnote{Interactive Demo for integrating manual feedback can be found \href{https://huggingface.co/spaces/skoneru/contextual_refinement_ende}{here}}

翻译：大型语言模型（LLM）已在多种自然语言处理任务中展现出显著成功，但在神经机器翻译（NMT）领域尚未达到最优性能。然而，它们在需要广泛理解与上下文处理的任务中的卓越表现，揭示了其在翻译领域的潜力。为利用这些能力，我们探究了将LLM用于机器翻译的方法，并研究了近期参数高效微调技术。令人惊讶的是，初步实验发现针对翻译任务的微调甚至导致了性能下降。为解决这一问题，我们提出替代方案：将LLM作为自动后编辑（APE）工具而非直接翻译器。基于LLM处理与生成长序列的非凡能力，我们还将该方法扩展至篇章级翻译。研究表明，利用低秩适配器微调进行APE能在句子级和篇章级指标上取得显著提升，并泛化到域外数据。尤为突出的是，我们在ContraPro测试集上达到了89%的最优准确率——该测试专门评估模型在英译德时消除代词歧义的能力。最后，我们探究了涉及人工后编辑的篇章级翻译实际场景，在此场景中参考上下文可供使用。结果表明，利用人工纠错可显著降低后续翻译所需的编辑次数\footnote{集成人工反馈的交互演示见：\href{https://huggingface.co/spaces/skoneru/contextual_refinement_ende}{此处}}。