Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks, but they have yet to attain state-of-the-art performance in Neural Machine Translation (NMT). Nevertheless, their significant performance in tasks demanding a broad understanding and contextual processing shows their potential for translation. To exploit these abilities, we investigate using LLM's for MT and explore recent parameter-efficient fine-tuning techniques. Surprisingly, our initial experiments find that fine-tuning for translation purposes even led to performance degradation. To overcome this, we propose an alternative approach: adapting LLM's as Automatic Post-Editors (APE) rather than direct translators. Building on the LLM's exceptional ability to process and generate lengthy sequences, we also propose extending our approach to document-level translation. We show that leveraging Low-Rank-Adapter fine-tuning for APE can yield significant improvements across both sentence and document-level metrics while generalizing to out-of-domain data. Most notably, we achieve a state-of-the-art accuracy rate of 89\% on the ContraPro test set, which specifically assesses the model's ability to resolve pronoun ambiguities when translating from English to German. Lastly, we investigate a practical scenario involving manual post-editing for document-level translation, where reference context is made available. Here, we demonstrate that leveraging human corrections can significantly reduce the number of edits required for subsequent translations (Interactive Demo for integrating manual feedback can be found here: https://huggingface.co/spaces/skoneru/contextual_refinement_ende).
翻译:大型语言模型(LLM)在各种自然语言处理任务中展现了显著成功,但在神经机器翻译(NMT)领域尚未达到最新技术水平。然而,它们在需要广泛理解和上下文处理的任务中的卓越表现,揭示了其在翻译领域的潜力。为利用这些能力,我们研究了将LLM用于机器翻译,并探索了近期参数高效微调技术。令人惊讶的是,我们的初步实验发现,针对翻译任务的微调甚至导致性能下降。为解决这一问题,我们提出了一种替代方法:将LLM适配为自动后编辑(APE)而非直接翻译器。基于LLM在处理和生成长序列方面的卓越能力,我们还提出将方法扩展到文档级翻译。研究表明,利用低秩适配器微调进行APE,可以在句子级和文档级指标上取得显著改进,同时泛化到域外数据。最值得注意的是,我们在ContraPro测试集上达到了89%的最新准确率,该测试集专门评估模型在英译德时解决代词歧义的能力。最后,我们研究了涉及文档级翻译手动后编辑的实际场景,其中参考上下文可用。在此,我们证明利用人工修正可以显著减少后续翻译所需的编辑次数(集成手动反馈的交互式演示见此链接:https://huggingface.co/spaces/skoneru/contextual_refinement_ende)。