Adapting Large Language Models for Document-Level Machine Translation

Large language models (LLMs) have made significant strides in various natural language processing (NLP) tasks. Recent research shows that the moderately-sized LLMs often outperform their larger counterparts after task-specific fine-tuning. In this work, we delve into the process of adapting LLMs to specialize in document-level machine translation (DocMT) for a specific language pair. Firstly, we explore how prompt strategies affect downstream translation performance. Then, we conduct extensive experiments with two fine-tuning methods, three LLM backbones, and 18 translation tasks across nine language pairs. Our findings indicate that in some cases, these specialized models even surpass GPT-4 in translation performance, while they still significantly suffer from the off-target translation issue in others, even if they are exclusively fine-tuned on bilingual parallel documents. Furthermore, we provide an in-depth analysis of these LLMs tailored for DocMT, exploring aspects such as translation errors, the scaling law of parallel documents, out-of-domain generalization, and the impact of zero-shot crosslingual transfer. The findings of this research not only shed light on the strengths and limitations of LLM-based DocMT models but also provide a foundation for future research in DocMT.

翻译：大语言模型（LLMs）在各类自然语言处理（NLP）任务中取得了显著进展。最新研究表明，经过任务特定微调后，中等规模的大语言模型往往比更大规模的模型表现更优。本研究深入探索了针对特定语言对，将大语言模型适配至文档级机器翻译（DocMT）的完整过程。首先，我们研究了提示策略对下游翻译性能的影响；随后，利用两种微调方法、三种大语言模型主干网络及涵盖九个语言对的18项翻译任务开展了广泛实验。实验结果表明：在部分场景中，这些专用模型甚至超越了GPT-4的翻译性能，但在其他场景中，即便仅使用双语平行文档进行微调，这些模型仍显著受制于脱靶翻译问题。此外，我们深度分析了面向DocMT的大语言模型，涉及翻译错误分析、平行文档的扩展律特性、域外泛化能力以及零样本跨语言迁移的影响机制。本研究的发现不仅揭示了基于大语言模型的DocMT系统的优势与局限，更可为未来DocMT研究奠定理论基础。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日