Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks. Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning. This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs. We first investigate the impact of prompt strategies on translation performance and then conduct extensive experiments using two fine-tuning methods, three LLM backbones, and 18 translation tasks across nine language pairs. Our results show that specialized models can sometimes surpass GPT-4 in translation performance but still face issues like off-target translation due to error propagation in decoding. We provide an in-depth analysis of these LLMs tailored for DocMT, examining translation errors, discourse phenomena, strategies for training and inference, the data efficiency of parallel documents, recent test set evaluations, and zero-shot crosslingual transfer. Our findings highlight the strengths and limitations of LLM-based DocMT models and provide a foundation for future research.
翻译:大语言模型(LLMs)显著推动了自然语言处理(NLP)各项任务的进展。近期研究表明,经过任务特定微调的中等规模LLMs往往能超越更大规模的模型。本研究专注于将LLMs适配于特定语言对的文档级机器翻译(DocMT)。我们首先探究了提示策略对翻译性能的影响,随后采用两种微调方法、三种LLM主干网络,在九个语言对的18项翻译任务上进行了广泛实验。实验结果表明,专用模型在翻译性能上有时能超越GPT-4,但仍存在因解码误差传播导致的偏离目标翻译等问题。我们对这些针对DocMT定制的LLMs进行了深入分析,包括翻译错误、语篇现象、训练与推理策略、平行文档的数据效率、最新测试集评估以及零样本跨语言迁移等方面。本研究结果揭示了基于LLM的DocMT模型的优势与局限,为未来研究奠定了基础。