Large language models (LLMs) have made significant strides in various natural language processing (NLP) tasks. Recent research shows that the moderately-sized LLMs often outperform their larger counterparts after task-specific fine-tuning. In this work, we delve into the process of adapting LLMs to specialize in document-level machine translation (DocMT) for a specific language pair. Firstly, we explore how prompt strategies affect downstream translation performance. Then, we conduct extensive experiments with two fine-tuning methods, three LLM backbones, and 18 translation tasks across nine language pairs. Our findings indicate that in some cases, these specialized models even surpass GPT-4 in translation performance, while they still significantly suffer from the off-target translation issue in others, even if they are exclusively fine-tuned on bilingual parallel documents. Furthermore, we provide an in-depth analysis of these LLMs tailored for DocMT, exploring aspects such as translation errors, the scaling law of parallel documents, out-of-domain generalization, and the impact of zero-shot crosslingual transfer. The findings of this research not only shed light on the strengths and limitations of LLM-based DocMT models but also provide a foundation for future research in DocMT.
翻译:大语言模型(LLMs)在各类自然语言处理(NLP)任务中取得了显著进展。最新研究表明,经过任务特定微调后,中等规模的大语言模型往往比更大规模的模型表现更优。本研究深入探索了针对特定语言对,将大语言模型适配至文档级机器翻译(DocMT)的完整过程。首先,我们研究了提示策略对下游翻译性能的影响;随后,利用两种微调方法、三种大语言模型主干网络及涵盖九个语言对的18项翻译任务开展了广泛实验。实验结果表明:在部分场景中,这些专用模型甚至超越了GPT-4的翻译性能,但在其他场景中,即便仅使用双语平行文档进行微调,这些模型仍显著受制于脱靶翻译问题。此外,我们深度分析了面向DocMT的大语言模型,涉及翻译错误分析、平行文档的扩展律特性、域外泛化能力以及零样本跨语言迁移的影响机制。本研究的发现不仅揭示了基于大语言模型的DocMT系统的优势与局限,更可为未来DocMT研究奠定理论基础。