This paper introduces the submission by Huawei Translation Center (HW-TSC) to the WMT24 Indian Languages Machine Translation (MT) Shared Task. To develop a reliable machine translation system for low-resource Indian languages, we employed two distinct knowledge transfer strategies, taking into account the characteristics of the language scripts and the support available from existing open-source models for Indian languages. For Assamese(as) and Manipuri(mn), we fine-tuned the existing IndicTrans2 open-source model to enable bidirectional translation between English and these languages. For Khasi (kh) and Mizo (mz), We trained a multilingual model as a baseline using bilingual data from these four language pairs, along with an additional about 8kw English-Bengali bilingual data, all of which share certain linguistic features. This was followed by fine-tuning to achieve bidirectional translation between English and Khasi, as well as English and Mizo. Our transfer learning experiments produced impressive results: 23.5 BLEU for en-as, 31.8 BLEU for en-mn, 36.2 BLEU for as-en, and 47.9 BLEU for mn-en on their respective test sets. Similarly, the multilingual model transfer learning experiments yielded impressive outcomes, achieving 19.7 BLEU for en-kh, 32.8 BLEU for en-mz, 16.1 BLEU for kh-en, and 33.9 BLEU for mz-en on their respective test sets. These results not only highlight the effectiveness of transfer learning techniques for low-resource languages but also contribute to advancing machine translation capabilities for low-resource Indian languages.
翻译:本文介绍了华为翻译中心(HW-TSC)向WMT24印度语言机器翻译(MT)共享任务提交的系统。为了为低资源印度语言开发可靠的机器翻译系统,我们采用了两种不同的知识迁移策略,同时考虑了语言文字的特点以及现有开源模型对印度语言的支持情况。对于阿萨姆语(as)和曼尼普尔语(mn),我们对现有的IndicTrans2开源模型进行了微调,以实现英语与这两种语言之间的双向翻译。对于卡西语(kh)和米佐语(mz),我们使用这四个语言对的平行数据,以及额外约8千句英语-孟加拉语平行数据(这些语言均共享某些语言特征),训练了一个多语言模型作为基线。随后通过微调实现了英语与卡西语、英语与米佐语之间的双向翻译。我们的迁移学习实验取得了令人印象深刻的结果:在各自的测试集上,en-as达到23.5 BLEU,en-mn达到31.8 BLEU,as-en达到36.2 BLEU,mn-en达到47.9 BLEU。同样,多语言模型迁移学习实验也取得了显著成果,在各自的测试集上,en-kh达到19.7 BLEU,en-mz达到32.8 BLEU,kh-en达到16.1 BLEU,mz-en达到33.9 BLEU。这些结果不仅凸显了迁移学习技术在低资源语言上的有效性,也为推进低资源印度语言的机器翻译能力做出了贡献。