Defining Boundaries: The Impact of Domain Specification on Cross-Language and Cross-Domain Transfer in Machine Translation

Recent advancements in neural machine translation (NMT) have revolutionized the field, yet the dependency on extensive parallel corpora limits progress for low-resource languages. Cross-lingual transfer learning offers a promising solution by utilizing data from high-resource languages but often struggles with in-domain NMT. In this paper, we investigate three pivotal aspects: enhancing the domain-specific quality of NMT by fine-tuning domain-relevant data from different language pairs, identifying which domains are transferable in zero-shot scenarios, and assessing the impact of language-specific versus domain-specific factors on adaptation effectiveness. Using English as the source language and Spanish for fine-tuning, we evaluate multiple target languages including Portuguese, Italian, French, Czech, Polish, and Greek. Our findings reveal significant improvements in domain-specific translation quality, especially in specialized fields such as medical, legal, and IT, underscoring the importance of well-defined domain data and transparency of the experiment setup in in-domain transfer learning.

翻译：神经机器翻译（NMT）的最新进展已彻底改变了该领域，然而对大规模平行语料库的依赖限制了低资源语言的发展。跨语言迁移学习通过利用高资源语言的数据提供了一种有前景的解决方案，但在领域内NMT中常常面临困难。本文研究了三个关键方面：通过微调来自不同语言对的领域相关数据来提升NMT的领域特定质量；识别在零样本场景中哪些领域是可迁移的；以及评估语言特定因素与领域特定因素对适应效果的影响。以英语为源语言、西班牙语进行微调，我们评估了包括葡萄牙语、意大利语、法语、捷克语、波兰语和希腊语在内的多个目标语言。我们的研究结果表明，领域特定翻译质量得到显著提升，尤其是在医疗、法律和IT等专业领域，这突显了明确定义的领域数据和实验设置透明度在领域内迁移学习中的重要性。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日